/[webpac]/openisis/current/doc/UniStat.txt
This is repository of my old source code which isn't updated any more. Go to git.rot13.org for current projects!
ViewVC logotype

Annotation of /openisis/current/doc/UniStat.txt

Parent Directory Parent Directory | Revision Log Revision Log


Revision 237 - (hide annotations)
Mon Mar 8 17:43:12 2004 UTC (20 years ago) by dpavlin
File MIME type: text/plain
File size: 13432 byte(s)
initial import of openisis 0.9.0 vendor drop

1 dpavlin 237 unicode block statistics of general character categories,
2     decomposition and uppercase mappings based on
3     Blocks.txt and UnicodeData.txt
4     > http://unicode.org/Public/UNIDATA/
5    
6    
7     * decomposition mappings
8     see
9     > http://unicode.org/Public/UNIDATA/UCD.html#Character_Decomposition_Mappings
10    
11     $
12     (canonical) 1926
13     circle 229
14     compat 673
15     final 240
16     font 1038
17     fraction 16
18     initial 171
19     isolated 238
20     medial 82
21     narrow 122
22     noBreak 5
23     small 26
24     square 205
25     sub 24
26     super 100
27     vertical 25
28     wide 104
29     $
30    
31    
32    
33     * major category/block table
34     Categories are letter, mark, numeric, punctuation, symbol, separator and other.
35     Additional columns give number of characters which have an uppercase and
36     canonical decomposition mapping, resp.
37     Final columns give begin and end, block length and name.
38    
39     $
40     Let Mar Num Pun Sym Sep Oth upC deC beg end len block
41     52 0 10 23 9 1 33 26 0 0000 007F 128 Basic Latin
42     65 0 6 5 18 1 33 32 53 0080 00FF 128 Latin-1 Supplement
43     128 0 0 0 0 0 0 63 108 0100 017F 128 Latin Extended-A
44     183 0 0 0 0 0 25 75 91 0180 024F 208 Latin Extended-B
45     96 0 0 0 0 0 0 19 0 0250 02AF 96 IPA Extensions
46     36 0 0 0 44 0 0 0 0 02B0 02FF 80 Spacing Modifier Letters
47     0 107 0 0 0 0 5 1 4 0300 036F 112 Combining Diacritical Marks
48     113 0 0 2 5 0 24 56 26 0370 03FF 144 Greek and Coptic
49     239 6 0 0 1 0 10 119 52 0400 04FF 256 Cyrillic
50     16 0 0 0 0 0 32 8 0 0500 052F 48 Cyrillic Supplementary
51     78 0 0 8 0 0 10 38 0 0530 058F 96 Armenian
52     30 47 0 5 0 0 30 0 0 0590 05FF 112 Hebrew
53     147 41 20 9 5 0 34 0 8 0600 06FF 256 Arabic
54     34 28 0 14 0 0 4 0 0 0700 074F 80 Syriac
55     39 11 0 0 0 0 14 0 0 0780 07BF 64 Thaana
56     66 26 10 3 0 0 23 0 11 0900 097F 128 Devanagari
57     52 19 16 0 3 0 38 0 5 0980 09FF 128 Bengali
58     51 16 10 0 0 0 51 0 6 0A00 0A7F 128 Gurmukhi
59     52 20 10 0 1 0 45 0 0 0A80 0AFF 128 Gujarati
60     53 17 10 0 1 0 47 0 5 0B00 0B7F 128 Oriya
61     35 14 12 0 8 0 59 0 4 0B80 0BFF 128 Tamil
62     51 19 10 0 0 0 48 0 1 0C00 0C7F 128 Telugu
63     53 19 10 0 0 0 46 0 5 0C80 0CFF 128 Kannada
64     52 16 10 0 0 0 50 0 3 0D00 0D7F 128 Malayalam
65     59 20 0 1 0 0 48 0 4 0D80 0DFF 128 Sinhala
66     57 16 10 3 1 0 41 0 0 0E00 0E7F 128 Thai
67     40 15 10 0 0 0 63 0 0 0E80 0EFF 128 Lao
68     47 74 20 20 32 0 63 0 17 0F00 0FFF 256 Tibetan
69     47 15 10 6 0 0 82 0 1 1000 109F 160 Myanmar
70     79 0 0 1 0 0 16 0 0 10A0 10FF 96 Georgian
71     240 0 0 0 0 0 16 0 0 1100 11FF 256 Hangul Jamo
72     317 0 20 8 0 0 39 0 0 1200 137F 384 Ethiopic
73     85 0 0 0 0 0 11 0 0 13A0 13FF 96 Cherokee
74     628 0 0 2 0 0 10 0 0 1400 167F 640 Unified Canadian Aboriginal Syllabics
75     26 0 0 2 0 1 3 0 0 1680 169F 32 Ogham
76     75 0 3 3 0 0 15 0 0 16A0 16FF 96 Runic
77     17 3 0 0 0 0 12 0 0 1700 171F 32 Tagalog
78     18 3 0 2 0 0 9 0 0 1720 173F 32 Hanunoo
79     18 2 0 0 0 0 12 0 0 1740 175F 32 Buhid
80     16 2 0 0 0 0 14 0 0 1760 177F 32 Tagbanwa
81     54 31 20 6 1 0 16 0 0 1780 17FF 128 Khmer
82     129 4 10 11 0 1 21 0 0 1800 18AF 176 Mongolian
83     29 24 10 2 1 0 14 0 0 1900 194F 80 Limbu
84     35 0 0 0 0 0 13 0 0 1950 197F 48 Tai Le
85     0 0 0 0 32 0 0 0 0 19E0 19FF 32 Khmer Symbols
86     108 0 0 0 0 0 20 0 0 1D00 1D7F 128 Phonetic Extensions
87     246 0 0 0 0 0 10 121 245 1E00 1EFF 256 Latin Extended Additional
88     218 0 0 0 15 0 23 97 229 1F00 1FFF 256 Greek Extended
89     0 0 0 60 2 16 34 0 2 2000 206F 112 General Punctuation
90     2 0 17 4 6 0 19 0 0 2070 209F 48 Superscripts and Subscripts
91     0 0 0 0 18 0 30 0 0 20A0 20CF 48 Currency Symbols
92     0 27 0 0 0 0 21 0 0 20D0 20FF 48 Combining Diacritical Marks for Symbols
93     43 0 0 0 32 0 5 0 3 2100 214F 80 Letterlike Symbols
94     0 0 49 0 0 0 15 16 0 2150 218F 64 Number Forms
95     0 0 0 0 112 0 0 0 6 2190 21FF 112 Arrows
96     0 0 0 0 256 0 0 0 38 2200 22FF 256 Mathematical Operators
97     0 0 0 5 204 0 47 0 2 2300 23FF 256 Miscellaneous Technical
98     0 0 0 0 39 0 25 0 0 2400 243F 64 Control Pictures
99     0 0 0 0 11 0 21 0 0 2440 245F 32 Optical Character Recognition
100     0 0 82 0 78 0 0 26 0 2460 24FF 160 Enclosed Alphanumerics
101     0 0 0 0 128 0 0 0 0 2500 257F 128 Box Drawing
102     0 0 0 0 32 0 0 0 0 2580 259F 32 Block Elements
103     0 0 0 0 96 0 0 0 0 25A0 25FF 96 Geometric Shapes
104     0 0 0 0 145 0 111 0 0 2600 26FF 256 Miscellaneous Symbols
105     0 0 30 14 130 0 18 0 0 2700 27BF 192 Dingbats
106     0 0 0 6 22 0 20 0 0 27C0 27EF 48 Miscellaneous Mathematical Symbols-A
107     0 0 0 0 16 0 0 0 0 27F0 27FF 16 Supplemental Arrows-A
108     0 0 0 0 256 0 0 0 0 2800 28FF 256 Braille Patterns
109     0 0 0 0 128 0 0 0 0 2900 297F 128 Supplemental Arrows-B
110     0 0 0 28 100 0 0 0 0 2980 29FF 128 Miscellaneous Mathematical Symbols-B
111     0 0 0 0 256 0 0 0 1 2A00 2AFF 256 Supplemental Mathematical Operators
112     0 0 0 0 14 0 242 0 0 2B00 2BFF 256 Miscellaneous Symbols and Arrows
113     0 0 0 0 115 0 13 0 0 2E80 2EFF 128 CJK Radicals Supplement
114     0 0 0 0 214 0 10 0 0 2F00 2FDF 224 Kangxi Radicals
115     0 0 0 0 12 0 4 0 0 2FF0 2FFF 16 Ideographic Description Characters
116     9 6 13 27 8 1 0 0 0 3000 303F 64 CJK Symbols and Punctuation
117     89 2 0 0 2 0 3 0 27 3040 309F 96 Hiragana
118     94 0 0 2 0 0 0 0 31 30A0 30FF 96 Katakana
119     40 0 0 0 0 0 8 0 0 3100 312F 48 Bopomofo
120     94 0 0 0 0 0 2 0 0 3130 318F 96 Hangul Compatibility Jamo
121     0 0 4 0 12 0 0 0 0 3190 319F 16 Kanbun
122     24 0 0 0 0 0 8 0 0 31A0 31BF 32 Bopomofo Extended
123     16 0 0 0 0 0 0 0 0 31F0 31FF 16 Katakana Phonetic Extensions
124     0 0 50 0 191 0 15 0 0 3200 32FF 256 Enclosed CJK Letters and Months
125     0 0 0 0 256 0 0 0 0 3300 33FF 256 CJK Compatibility
126     6582 0 0 0 0 0 10 0 0 3400 4DBF 6592 CJK Unified Ideographs Extension A
127     0 0 0 0 64 0 0 0 0 4DC0 4DFF 64 Yijing Hexagram Symbols
128     20902 0 0 0 0 0 90 0 0 4E00 9FFF 20992 CJK Unified Ideographs
129     1165 0 0 0 0 0 3 0 0 A000 A48F 1168 Yi Syllables
130     0 0 0 0 55 0 9 0 0 A490 A4CF 64 Yi Radicals
131     11172 0 0 0 0 0 12 0 0 AC00 D7AF 11184 Hangul Syllables
132     0 0 0 0 0 0 896 0 0 D800 DB7F 896 High Surrogates
133     0 0 0 0 0 0 128 0 0 DB80 DBFF 128 High Private Use Surrogates
134     0 0 0 0 0 0 1024 0 0 DC00 DFFF 1024 Low Surrogates
135     0 0 0 0 0 0 6400 0 0 E000 F8FF 6400 Private Use Area
136     361 0 0 0 0 0 151 0 349 F900 FAFF 512 CJK Compatibility Ideographs
137     56 1 0 0 1 0 22 0 34 FB00 FB4F 80 Alphabetic Presentation Forms
138     591 0 0 2 2 0 93 0 0 FB50 FDFF 688 Arabic Presentation Forms-A
139     0 16 0 0 0 0 0 0 0 FE00 FE0F 16 Variation Selectors
140     0 4 0 0 0 0 12 0 0 FE20 FE2F 16 Combining Half Marks
141     0 0 0 32 0 0 0 0 0 FE30 FE4F 32 CJK Compatibility Forms
142     0 0 0 21 5 0 6 0 0 FE50 FE6F 32 Small Form Variants
143     140 0 0 0 0 0 4 0 0 FE70 FEFF 144 Arabic Presentation Forms-B
144     162 0 10 30 23 0 15 26 0 FF00 FFEF 240 Halfwidth and Fullwidth Forms
145     0 0 0 0 2 0 14 0 0 FFF0 FFFF 16 Specials
146     88 0 0 0 0 0 40 0 0 10000 1007F 128 Linear B Syllabary
147     123 0 0 0 0 0 5 0 0 10080 100FF 128 Linear B Ideograms
148     0 0 45 2 10 0 7 0 0 10100 1013F 64 Aegean Numbers
149     31 0 4 0 0 0 13 0 0 10300 1032F 48 Old Italic
150     26 0 1 0 0 0 5 0 0 10330 1034F 32 Gothic
151     30 0 0 1 0 0 1 0 0 10380 1039F 32 Ugaritic
152     80 0 0 0 0 0 0 40 0 10400 1044F 80 Deseret
153     48 0 0 0 0 0 0 0 0 10450 1047F 48 Shavian
154     30 0 10 0 0 0 8 0 0 10480 104AF 48 Osmanya
155     55 0 0 0 0 0 9 0 0 10800 1083F 64 Cypriot Syllabary
156     0 0 0 0 246 0 10 0 0 1D000 1D0FF 256 Byzantine Musical Symbols
157     0 30 0 0 181 0 45 0 13 1D100 1D1FF 256 Musical Symbols
158     0 0 0 0 87 0 9 0 0 1D300 1D35F 96 Tai Xuan Jing Symbols
159     932 0 50 0 10 0 32 0 0 1D400 1D7FF 1024 Mathematical Alphanumeric Symbols
160     42711 0 0 0 0 0 9 0 0 20000 2A6DF 42720 CJK Unified Ideographs Extension B
161     542 0 0 0 0 0 2 0 542 2F800 2FA1F 544 CJK Compatibility Ideographs Supplement
162     0 0 0 0 0 0 128 0 0 E0000 E007F 128 Tags
163     0 240 0 0 0 0 0 0 0 E0100 E01EF 240 Variation Selectors Supplement
164     0 0 0 0 0 0 65536 0 0 F0000 FFFFF 65536 Supplementary Private Use Area-A
165     0 0 0 0 0 0 65536 0 0 100000 10FFFF 65536 Supplementary Private Use Area-B
166     90547 941 612 370 3754 21 142187 763 1926
167     $
168     BMP: nonblock 4112 unassigned 2247
169    
170    
171     * detailled block stats
172     see
173     > http://unicode.org/Public/UNIDATA/UCD.html#General_Category_Values
174    
175     $
176     Basic Latin b0000 l128 Lu26 Ll26 Nd10 Zs1 Cc33 Pc1 Pd1 Ps3 Pe3 Po15 Sm6 Sc1 Sk2
177     Latin-1 Supplement b0080 l128 Lu30 Ll35 No6 Zs1 Cc32 Cf1 Pi1 Pf1 Po3 Sm4 Sc4 Sk4 So6
178     Latin Extended-A b0100 l128 Lu63 Ll65
179     Latin Extended-B b0180 l208 Cn25 Lu90 Ll84 Lt4 Lo5
180     IPA Extensions b0250 l96 Ll96
181     Spacing Modifier Letters b02B0 l80 Lm36 Sk44
182     Combining Diacritical Marks b0300 l112 Cn5 Mn107
183     Greek and Coptic b0370 l144 Cn24 Lu52 Ll60 Lm1 Po2 Sm1 Sk4
184     Cyrillic b0400 l256 Cn10 Lu120 Ll119 Mn4 Me2 So1
185     Cyrillic Supplementary b0500 l48 Cn32 Lu8 Ll8
186     Armenian b0530 l96 Cn10 Lu38 Ll39 Lm1 Pd1 Po7
187     Hebrew b0590 l112 Cn30 Lo30 Mn47 Po5
188     Arabic b0600 l256 Cn29 Lm3 Lo144 Mn40 Me1 Nd20 Cf5 Po9 So5
189     Syriac b0700 l80 Cn3 Lo34 Mn28 Cf1 Po14
190     Thaana b0780 l64 Cn14 Lo39 Mn11
191     Devanagari b0900 l128 Cn23 Lo66 Mn18 Mc8 Nd10 Po3
192     Bengali b0980 l128 Cn38 Lo52 Mn9 Mc10 Nd10 No6 Sc2 So1
193     Gurmukhi b0A00 l128 Cn51 Lo51 Mn12 Mc4 Nd10
194     Gujarati b0A80 l128 Cn45 Lo52 Mn13 Mc7 Nd10 Sc1
195     Oriya b0B00 l128 Cn47 Lo53 Mn8 Mc9 Nd10 So1
196     Tamil b0B80 l128 Cn59 Lo35 Mn3 Mc11 Nd9 No3 Sc1 So7
197     Telugu b0C00 l128 Cn48 Lo51 Mn12 Mc7 Nd10
198     Kannada b0C80 l128 Cn46 Lo53 Mn5 Mc14 Nd10
199     Malayalam b0D00 l128 Cn50 Lo52 Mn4 Mc12 Nd10
200     Sinhala b0D80 l128 Cn48 Lo59 Mn5 Mc15 Po1
201     Thai b0E00 l128 Cn41 Lm1 Lo56 Mn16 Nd10 Po3 Sc1
202     Lao b0E80 l128 Cn63 Lm1 Lo39 Mn15 Nd10
203     Tibetan b0F00 l256 Cn63 Lo47 Mn71 Mc3 Nd10 No10 Ps2 Pe2 Po16 So32
204     Myanmar b1000 l160 Cn82 Lo47 Mn10 Mc5 Nd10 Po6
205     Georgian b10A0 l96 Cn16 Lu38 Lo41 Po1
206     Hangul Jamo b1100 l256 Cn16 Lo240
207     Ethiopic b1200 l384 Cn39 Lo317 Nd9 No11 Po8
208     Cherokee b13A0 l96 Cn11 Lo85
209     Unified Canadian Aboriginal Syllabics b1400 l640 Cn10 Lo628 Po2
210     Ogham b1680 l32 Cn3 Lo26 Zs1 Ps1 Pe1
211     Runic b16A0 l96 Cn15 Lo75 Nl3 Po3
212     Tagalog b1700 l32 Cn12 Lo17 Mn3
213     Hanunoo b1720 l32 Cn9 Lo18 Mn3 Po2
214     Buhid b1740 l32 Cn12 Lo18 Mn2
215     Tagbanwa b1760 l32 Cn14 Lo16 Mn2
216     Khmer b1780 l128 Cn14 Lm1 Lo53 Mn20 Mc11 Nd10 No10 Cf2 Po6 Sc1
217     Mongolian b1800 l176 Cn21 Lm1 Lo128 Mn4 Nd10 Zs1 Pd1 Po10
218     Limbu b1900 l80 Cn14 Lo29 Mn9 Mc15 Nd10 Po2 So1
219     Tai Le b1950 l48 Cn13 Lo35
220     Khmer Symbols b19E0 l32 So32
221     Phonetic Extensions b1D00 l128 Cn20 Ll54 Lm54
222     Latin Extended Additional b1E00 l256 Cn10 Lu120 Ll126
223     Greek Extended b1F00 l256 Cn23 Lu69 Ll122 Lt27 Sk15
224     General Punctuation b2000 l112 Cn15 Zs14 Zl1 Zp1 Cf19 Pc3 Pd6 Ps3 Pe1 Pi5 Pf3 Po39 Sm2
225     Superscripts and Subscripts b2070 l48 Cn19 Ll2 No17 Ps2 Pe2 Sm6
226     Currency Symbols b20A0 l48 Cn30 Sc18
227     Combining Diacritical Marks for Symbols b20D0 l48 Cn21 Mn20 Me7
228     Letterlike Symbols b2100 l80 Cn5 Lu27 Ll12 Lo4 Sm6 So26
229     Number Forms b2150 l64 Cn15 Nl36 No13
230     Arrows b2190 l112 Sm27 So85
231     Mathematical Operators b2200 l256 Sm256
232     Miscellaneous Technical b2300 l256 Cn47 Ps2 Pe2 Po1 Sm32 So172
233     Control Pictures b2400 l64 Cn25 So39
234     Optical Character Recognition b2440 l32 Cn21 So11
235     Enclosed Alphanumerics b2460 l160 No82 So78
236     Box Drawing b2500 l128 So128
237     Block Elements b2580 l32 So32
238     Geometric Shapes b25A0 l96 Sm10 So86
239     Miscellaneous Symbols b2600 l256 Cn111 Sm1 So144
240     Dingbats b2700 l192 Cn18 No30 Ps7 Pe7 So130
241     Miscellaneous Mathematical Symbols-A b27C0 l48 Cn20 Ps3 Pe3 Sm22
242     Supplemental Arrows-A b27F0 l16 Sm16
243     Braille Patterns b2800 l256 So256
244     Supplemental Arrows-B b2900 l128 Sm128
245     Miscellaneous Mathematical Symbols-B b2980 l128 Ps14 Pe14 Sm100
246     Supplemental Mathematical Operators b2A00 l256 Sm256
247     Miscellaneous Symbols and Arrows b2B00 l256 Cn242 So14
248     CJK Radicals Supplement b2E80 l128 Cn13 So115
249     Kangxi Radicals b2F00 l224 Cn10 So214
250     Ideographic Description Characters b2FF0 l16 Cn4 So12
251     CJK Symbols and Punctuation b3000 l64 Lm7 Lo2 Mn6 Nl13 Zs1 Pd2 Ps10 Pe11 Po4 So8
252     Hiragana b3040 l96 Cn3 Lm2 Lo87 Mn2 Sk2
253     Katakana b30A0 l96 Lm3 Lo91 Pc1 Pd1
254     Bopomofo b3100 l48 Cn8 Lo40
255     Hangul Compatibility Jamo b3130 l96 Cn2 Lo94
256     Kanbun b3190 l16 No4 So12
257     Bopomofo Extended b31A0 l32 Cn8 Lo24
258     Katakana Phonetic Extensions b31F0 l16 Lo16
259     Enclosed CJK Letters and Months b3200 l256 Cn15 No50 So191
260     CJK Compatibility b3300 l256 So256
261     CJK Unified Ideographs Extension A b3400 l6592 Cn10 Lo6582
262     Yijing Hexagram Symbols b4DC0 l64 So64
263     CJK Unified Ideographs b4E00 l20992 Cn90 Lo20902
264     Yi Syllables bA000 l1168 Cn3 Lo1165
265     Yi Radicals bA490 l64 Cn9 So55
266     Hangul Syllables bAC00 l11184 Cn12 Lo11172
267     High Surrogates bD800 l896 Cs896
268     High Private Use Surrogates bDB80 l128 Cs128
269     Low Surrogates bDC00 l1024 Cs1024
270     Private Use Area bE000 l6400 Co6400
271     CJK Compatibility Ideographs bF900 l512 Cn151 Lo361
272     Alphabetic Presentation Forms bFB00 l80 Cn22 Ll12 Lo44 Mn1 Sm1
273     Arabic Presentation Forms-A bFB50 l688 Cn93 Lo591 Ps1 Pe1 Sc1 So1
274     Variation Selectors bFE00 l16 Mn16
275     Combining Half Marks bFE20 l16 Cn12 Mn4
276     CJK Compatibility Forms bFE30 l32 Pc5 Pd2 Ps9 Pe9 Po7
277     Small Form Variants bFE50 l32 Cn6 Pd2 Ps3 Pe3 Po13 Sm4 Sc1
278     Arabic Presentation Forms-B bFE70 l144 Cn3 Lo140 Cf1
279     Halfwidth and Fullwidth Forms bFF00 l240 Cn15 Lu26 Ll26 Lm3 Lo107 Nd10 Pc2 Pd1 Ps5 Pe5 Po17 Sm11 Sc5 Sk3 So4
280     Specials bFFF0 l16 Cn11 Cf3 So2
281     Linear B Syllabary b10000 l128 Cn40 Lo88
282     Linear B Ideograms b10080 l128 Cn5 Lo123
283     Aegean Numbers b10100 l64 Cn7 No45 Po2 So10
284     Old Italic b10300 l48 Cn13 Lo31 No4
285     Gothic b10330 l32 Cn5 Lo26 Nl1
286     Ugaritic b10380 l32 Cn1 Lo30 Po1
287     Deseret b10400 l80 Lu40 Ll40
288     Shavian b10450 l48 Lo48
289     Osmanya b10480 l48 Cn8 Lo30 Nd10
290     Cypriot Syllabary b10800 l64 Cn9 Lo55
291     Byzantine Musical Symbols b1D000 l256 Cn10 So246
292     Musical Symbols b1D100 l256 Cn37 Mn22 Mc8 Cf8 So181
293     Tai Xuan Jing Symbols b1D300 l96 Cn9 So87
294     Mathematical Alphanumeric Symbols b1D400 l1024 Cn32 Lu443 Ll489 Nd50 Sm10
295     CJK Unified Ideographs Extension B b20000 l42720 Cn9 Lo42711
296     CJK Compatibility Ideographs Supplement b2F800 l544 Cn2 Lo542
297     Tags bE0000 l128 Cn31 Cf97
298     Variation Selectors Supplement bE0100 l240 Mn240
299     Supplementary Private Use Area-A bF0000 l65536 Cn2 Co65534
300     Supplementary Private Use Area-B b100000 l65536 Cn2 Co65534
301     $
302    
303    

  ViewVC Help
Powered by ViewVC 1.1.26