/[webpac]/trunk2/openisis/doc/UniStat.txt
This is repository of my old source code which isn't updated any more. Go to git.rot13.org for current projects!
ViewVC logotype

Contents of /trunk2/openisis/doc/UniStat.txt

Parent Directory Parent Directory | Revision Log Revision Log


Revision 337 - (show annotations)
Thu Jun 10 19:22:40 2004 UTC (19 years, 9 months ago) by dpavlin
File MIME type: text/plain
File size: 13432 byte(s)
new trunk for webpac v2

1 unicode block statistics of general character categories,
2 decomposition and uppercase mappings based on
3 Blocks.txt and UnicodeData.txt
4 > http://unicode.org/Public/UNIDATA/
5
6
7 * decomposition mappings
8 see
9 > http://unicode.org/Public/UNIDATA/UCD.html#Character_Decomposition_Mappings
10
11 $
12 (canonical) 1926
13 circle 229
14 compat 673
15 final 240
16 font 1038
17 fraction 16
18 initial 171
19 isolated 238
20 medial 82
21 narrow 122
22 noBreak 5
23 small 26
24 square 205
25 sub 24
26 super 100
27 vertical 25
28 wide 104
29 $
30
31
32
33 * major category/block table
34 Categories are letter, mark, numeric, punctuation, symbol, separator and other.
35 Additional columns give number of characters which have an uppercase and
36 canonical decomposition mapping, resp.
37 Final columns give begin and end, block length and name.
38
39 $
40 Let Mar Num Pun Sym Sep Oth upC deC beg end len block
41 52 0 10 23 9 1 33 26 0 0000 007F 128 Basic Latin
42 65 0 6 5 18 1 33 32 53 0080 00FF 128 Latin-1 Supplement
43 128 0 0 0 0 0 0 63 108 0100 017F 128 Latin Extended-A
44 183 0 0 0 0 0 25 75 91 0180 024F 208 Latin Extended-B
45 96 0 0 0 0 0 0 19 0 0250 02AF 96 IPA Extensions
46 36 0 0 0 44 0 0 0 0 02B0 02FF 80 Spacing Modifier Letters
47 0 107 0 0 0 0 5 1 4 0300 036F 112 Combining Diacritical Marks
48 113 0 0 2 5 0 24 56 26 0370 03FF 144 Greek and Coptic
49 239 6 0 0 1 0 10 119 52 0400 04FF 256 Cyrillic
50 16 0 0 0 0 0 32 8 0 0500 052F 48 Cyrillic Supplementary
51 78 0 0 8 0 0 10 38 0 0530 058F 96 Armenian
52 30 47 0 5 0 0 30 0 0 0590 05FF 112 Hebrew
53 147 41 20 9 5 0 34 0 8 0600 06FF 256 Arabic
54 34 28 0 14 0 0 4 0 0 0700 074F 80 Syriac
55 39 11 0 0 0 0 14 0 0 0780 07BF 64 Thaana
56 66 26 10 3 0 0 23 0 11 0900 097F 128 Devanagari
57 52 19 16 0 3 0 38 0 5 0980 09FF 128 Bengali
58 51 16 10 0 0 0 51 0 6 0A00 0A7F 128 Gurmukhi
59 52 20 10 0 1 0 45 0 0 0A80 0AFF 128 Gujarati
60 53 17 10 0 1 0 47 0 5 0B00 0B7F 128 Oriya
61 35 14 12 0 8 0 59 0 4 0B80 0BFF 128 Tamil
62 51 19 10 0 0 0 48 0 1 0C00 0C7F 128 Telugu
63 53 19 10 0 0 0 46 0 5 0C80 0CFF 128 Kannada
64 52 16 10 0 0 0 50 0 3 0D00 0D7F 128 Malayalam
65 59 20 0 1 0 0 48 0 4 0D80 0DFF 128 Sinhala
66 57 16 10 3 1 0 41 0 0 0E00 0E7F 128 Thai
67 40 15 10 0 0 0 63 0 0 0E80 0EFF 128 Lao
68 47 74 20 20 32 0 63 0 17 0F00 0FFF 256 Tibetan
69 47 15 10 6 0 0 82 0 1 1000 109F 160 Myanmar
70 79 0 0 1 0 0 16 0 0 10A0 10FF 96 Georgian
71 240 0 0 0 0 0 16 0 0 1100 11FF 256 Hangul Jamo
72 317 0 20 8 0 0 39 0 0 1200 137F 384 Ethiopic
73 85 0 0 0 0 0 11 0 0 13A0 13FF 96 Cherokee
74 628 0 0 2 0 0 10 0 0 1400 167F 640 Unified Canadian Aboriginal Syllabics
75 26 0 0 2 0 1 3 0 0 1680 169F 32 Ogham
76 75 0 3 3 0 0 15 0 0 16A0 16FF 96 Runic
77 17 3 0 0 0 0 12 0 0 1700 171F 32 Tagalog
78 18 3 0 2 0 0 9 0 0 1720 173F 32 Hanunoo
79 18 2 0 0 0 0 12 0 0 1740 175F 32 Buhid
80 16 2 0 0 0 0 14 0 0 1760 177F 32 Tagbanwa
81 54 31 20 6 1 0 16 0 0 1780 17FF 128 Khmer
82 129 4 10 11 0 1 21 0 0 1800 18AF 176 Mongolian
83 29 24 10 2 1 0 14 0 0 1900 194F 80 Limbu
84 35 0 0 0 0 0 13 0 0 1950 197F 48 Tai Le
85 0 0 0 0 32 0 0 0 0 19E0 19FF 32 Khmer Symbols
86 108 0 0 0 0 0 20 0 0 1D00 1D7F 128 Phonetic Extensions
87 246 0 0 0 0 0 10 121 245 1E00 1EFF 256 Latin Extended Additional
88 218 0 0 0 15 0 23 97 229 1F00 1FFF 256 Greek Extended
89 0 0 0 60 2 16 34 0 2 2000 206F 112 General Punctuation
90 2 0 17 4 6 0 19 0 0 2070 209F 48 Superscripts and Subscripts
91 0 0 0 0 18 0 30 0 0 20A0 20CF 48 Currency Symbols
92 0 27 0 0 0 0 21 0 0 20D0 20FF 48 Combining Diacritical Marks for Symbols
93 43 0 0 0 32 0 5 0 3 2100 214F 80 Letterlike Symbols
94 0 0 49 0 0 0 15 16 0 2150 218F 64 Number Forms
95 0 0 0 0 112 0 0 0 6 2190 21FF 112 Arrows
96 0 0 0 0 256 0 0 0 38 2200 22FF 256 Mathematical Operators
97 0 0 0 5 204 0 47 0 2 2300 23FF 256 Miscellaneous Technical
98 0 0 0 0 39 0 25 0 0 2400 243F 64 Control Pictures
99 0 0 0 0 11 0 21 0 0 2440 245F 32 Optical Character Recognition
100 0 0 82 0 78 0 0 26 0 2460 24FF 160 Enclosed Alphanumerics
101 0 0 0 0 128 0 0 0 0 2500 257F 128 Box Drawing
102 0 0 0 0 32 0 0 0 0 2580 259F 32 Block Elements
103 0 0 0 0 96 0 0 0 0 25A0 25FF 96 Geometric Shapes
104 0 0 0 0 145 0 111 0 0 2600 26FF 256 Miscellaneous Symbols
105 0 0 30 14 130 0 18 0 0 2700 27BF 192 Dingbats
106 0 0 0 6 22 0 20 0 0 27C0 27EF 48 Miscellaneous Mathematical Symbols-A
107 0 0 0 0 16 0 0 0 0 27F0 27FF 16 Supplemental Arrows-A
108 0 0 0 0 256 0 0 0 0 2800 28FF 256 Braille Patterns
109 0 0 0 0 128 0 0 0 0 2900 297F 128 Supplemental Arrows-B
110 0 0 0 28 100 0 0 0 0 2980 29FF 128 Miscellaneous Mathematical Symbols-B
111 0 0 0 0 256 0 0 0 1 2A00 2AFF 256 Supplemental Mathematical Operators
112 0 0 0 0 14 0 242 0 0 2B00 2BFF 256 Miscellaneous Symbols and Arrows
113 0 0 0 0 115 0 13 0 0 2E80 2EFF 128 CJK Radicals Supplement
114 0 0 0 0 214 0 10 0 0 2F00 2FDF 224 Kangxi Radicals
115 0 0 0 0 12 0 4 0 0 2FF0 2FFF 16 Ideographic Description Characters
116 9 6 13 27 8 1 0 0 0 3000 303F 64 CJK Symbols and Punctuation
117 89 2 0 0 2 0 3 0 27 3040 309F 96 Hiragana
118 94 0 0 2 0 0 0 0 31 30A0 30FF 96 Katakana
119 40 0 0 0 0 0 8 0 0 3100 312F 48 Bopomofo
120 94 0 0 0 0 0 2 0 0 3130 318F 96 Hangul Compatibility Jamo
121 0 0 4 0 12 0 0 0 0 3190 319F 16 Kanbun
122 24 0 0 0 0 0 8 0 0 31A0 31BF 32 Bopomofo Extended
123 16 0 0 0 0 0 0 0 0 31F0 31FF 16 Katakana Phonetic Extensions
124 0 0 50 0 191 0 15 0 0 3200 32FF 256 Enclosed CJK Letters and Months
125 0 0 0 0 256 0 0 0 0 3300 33FF 256 CJK Compatibility
126 6582 0 0 0 0 0 10 0 0 3400 4DBF 6592 CJK Unified Ideographs Extension A
127 0 0 0 0 64 0 0 0 0 4DC0 4DFF 64 Yijing Hexagram Symbols
128 20902 0 0 0 0 0 90 0 0 4E00 9FFF 20992 CJK Unified Ideographs
129 1165 0 0 0 0 0 3 0 0 A000 A48F 1168 Yi Syllables
130 0 0 0 0 55 0 9 0 0 A490 A4CF 64 Yi Radicals
131 11172 0 0 0 0 0 12 0 0 AC00 D7AF 11184 Hangul Syllables
132 0 0 0 0 0 0 896 0 0 D800 DB7F 896 High Surrogates
133 0 0 0 0 0 0 128 0 0 DB80 DBFF 128 High Private Use Surrogates
134 0 0 0 0 0 0 1024 0 0 DC00 DFFF 1024 Low Surrogates
135 0 0 0 0 0 0 6400 0 0 E000 F8FF 6400 Private Use Area
136 361 0 0 0 0 0 151 0 349 F900 FAFF 512 CJK Compatibility Ideographs
137 56 1 0 0 1 0 22 0 34 FB00 FB4F 80 Alphabetic Presentation Forms
138 591 0 0 2 2 0 93 0 0 FB50 FDFF 688 Arabic Presentation Forms-A
139 0 16 0 0 0 0 0 0 0 FE00 FE0F 16 Variation Selectors
140 0 4 0 0 0 0 12 0 0 FE20 FE2F 16 Combining Half Marks
141 0 0 0 32 0 0 0 0 0 FE30 FE4F 32 CJK Compatibility Forms
142 0 0 0 21 5 0 6 0 0 FE50 FE6F 32 Small Form Variants
143 140 0 0 0 0 0 4 0 0 FE70 FEFF 144 Arabic Presentation Forms-B
144 162 0 10 30 23 0 15 26 0 FF00 FFEF 240 Halfwidth and Fullwidth Forms
145 0 0 0 0 2 0 14 0 0 FFF0 FFFF 16 Specials
146 88 0 0 0 0 0 40 0 0 10000 1007F 128 Linear B Syllabary
147 123 0 0 0 0 0 5 0 0 10080 100FF 128 Linear B Ideograms
148 0 0 45 2 10 0 7 0 0 10100 1013F 64 Aegean Numbers
149 31 0 4 0 0 0 13 0 0 10300 1032F 48 Old Italic
150 26 0 1 0 0 0 5 0 0 10330 1034F 32 Gothic
151 30 0 0 1 0 0 1 0 0 10380 1039F 32 Ugaritic
152 80 0 0 0 0 0 0 40 0 10400 1044F 80 Deseret
153 48 0 0 0 0 0 0 0 0 10450 1047F 48 Shavian
154 30 0 10 0 0 0 8 0 0 10480 104AF 48 Osmanya
155 55 0 0 0 0 0 9 0 0 10800 1083F 64 Cypriot Syllabary
156 0 0 0 0 246 0 10 0 0 1D000 1D0FF 256 Byzantine Musical Symbols
157 0 30 0 0 181 0 45 0 13 1D100 1D1FF 256 Musical Symbols
158 0 0 0 0 87 0 9 0 0 1D300 1D35F 96 Tai Xuan Jing Symbols
159 932 0 50 0 10 0 32 0 0 1D400 1D7FF 1024 Mathematical Alphanumeric Symbols
160 42711 0 0 0 0 0 9 0 0 20000 2A6DF 42720 CJK Unified Ideographs Extension B
161 542 0 0 0 0 0 2 0 542 2F800 2FA1F 544 CJK Compatibility Ideographs Supplement
162 0 0 0 0 0 0 128 0 0 E0000 E007F 128 Tags
163 0 240 0 0 0 0 0 0 0 E0100 E01EF 240 Variation Selectors Supplement
164 0 0 0 0 0 0 65536 0 0 F0000 FFFFF 65536 Supplementary Private Use Area-A
165 0 0 0 0 0 0 65536 0 0 100000 10FFFF 65536 Supplementary Private Use Area-B
166 90547 941 612 370 3754 21 142187 763 1926
167 $
168 BMP: nonblock 4112 unassigned 2247
169
170
171 * detailled block stats
172 see
173 > http://unicode.org/Public/UNIDATA/UCD.html#General_Category_Values
174
175 $
176 Basic Latin b0000 l128 Lu26 Ll26 Nd10 Zs1 Cc33 Pc1 Pd1 Ps3 Pe3 Po15 Sm6 Sc1 Sk2
177 Latin-1 Supplement b0080 l128 Lu30 Ll35 No6 Zs1 Cc32 Cf1 Pi1 Pf1 Po3 Sm4 Sc4 Sk4 So6
178 Latin Extended-A b0100 l128 Lu63 Ll65
179 Latin Extended-B b0180 l208 Cn25 Lu90 Ll84 Lt4 Lo5
180 IPA Extensions b0250 l96 Ll96
181 Spacing Modifier Letters b02B0 l80 Lm36 Sk44
182 Combining Diacritical Marks b0300 l112 Cn5 Mn107
183 Greek and Coptic b0370 l144 Cn24 Lu52 Ll60 Lm1 Po2 Sm1 Sk4
184 Cyrillic b0400 l256 Cn10 Lu120 Ll119 Mn4 Me2 So1
185 Cyrillic Supplementary b0500 l48 Cn32 Lu8 Ll8
186 Armenian b0530 l96 Cn10 Lu38 Ll39 Lm1 Pd1 Po7
187 Hebrew b0590 l112 Cn30 Lo30 Mn47 Po5
188 Arabic b0600 l256 Cn29 Lm3 Lo144 Mn40 Me1 Nd20 Cf5 Po9 So5
189 Syriac b0700 l80 Cn3 Lo34 Mn28 Cf1 Po14
190 Thaana b0780 l64 Cn14 Lo39 Mn11
191 Devanagari b0900 l128 Cn23 Lo66 Mn18 Mc8 Nd10 Po3
192 Bengali b0980 l128 Cn38 Lo52 Mn9 Mc10 Nd10 No6 Sc2 So1
193 Gurmukhi b0A00 l128 Cn51 Lo51 Mn12 Mc4 Nd10
194 Gujarati b0A80 l128 Cn45 Lo52 Mn13 Mc7 Nd10 Sc1
195 Oriya b0B00 l128 Cn47 Lo53 Mn8 Mc9 Nd10 So1
196 Tamil b0B80 l128 Cn59 Lo35 Mn3 Mc11 Nd9 No3 Sc1 So7
197 Telugu b0C00 l128 Cn48 Lo51 Mn12 Mc7 Nd10
198 Kannada b0C80 l128 Cn46 Lo53 Mn5 Mc14 Nd10
199 Malayalam b0D00 l128 Cn50 Lo52 Mn4 Mc12 Nd10
200 Sinhala b0D80 l128 Cn48 Lo59 Mn5 Mc15 Po1
201 Thai b0E00 l128 Cn41 Lm1 Lo56 Mn16 Nd10 Po3 Sc1
202 Lao b0E80 l128 Cn63 Lm1 Lo39 Mn15 Nd10
203 Tibetan b0F00 l256 Cn63 Lo47 Mn71 Mc3 Nd10 No10 Ps2 Pe2 Po16 So32
204 Myanmar b1000 l160 Cn82 Lo47 Mn10 Mc5 Nd10 Po6
205 Georgian b10A0 l96 Cn16 Lu38 Lo41 Po1
206 Hangul Jamo b1100 l256 Cn16 Lo240
207 Ethiopic b1200 l384 Cn39 Lo317 Nd9 No11 Po8
208 Cherokee b13A0 l96 Cn11 Lo85
209 Unified Canadian Aboriginal Syllabics b1400 l640 Cn10 Lo628 Po2
210 Ogham b1680 l32 Cn3 Lo26 Zs1 Ps1 Pe1
211 Runic b16A0 l96 Cn15 Lo75 Nl3 Po3
212 Tagalog b1700 l32 Cn12 Lo17 Mn3
213 Hanunoo b1720 l32 Cn9 Lo18 Mn3 Po2
214 Buhid b1740 l32 Cn12 Lo18 Mn2
215 Tagbanwa b1760 l32 Cn14 Lo16 Mn2
216 Khmer b1780 l128 Cn14 Lm1 Lo53 Mn20 Mc11 Nd10 No10 Cf2 Po6 Sc1
217 Mongolian b1800 l176 Cn21 Lm1 Lo128 Mn4 Nd10 Zs1 Pd1 Po10
218 Limbu b1900 l80 Cn14 Lo29 Mn9 Mc15 Nd10 Po2 So1
219 Tai Le b1950 l48 Cn13 Lo35
220 Khmer Symbols b19E0 l32 So32
221 Phonetic Extensions b1D00 l128 Cn20 Ll54 Lm54
222 Latin Extended Additional b1E00 l256 Cn10 Lu120 Ll126
223 Greek Extended b1F00 l256 Cn23 Lu69 Ll122 Lt27 Sk15
224 General Punctuation b2000 l112 Cn15 Zs14 Zl1 Zp1 Cf19 Pc3 Pd6 Ps3 Pe1 Pi5 Pf3 Po39 Sm2
225 Superscripts and Subscripts b2070 l48 Cn19 Ll2 No17 Ps2 Pe2 Sm6
226 Currency Symbols b20A0 l48 Cn30 Sc18
227 Combining Diacritical Marks for Symbols b20D0 l48 Cn21 Mn20 Me7
228 Letterlike Symbols b2100 l80 Cn5 Lu27 Ll12 Lo4 Sm6 So26
229 Number Forms b2150 l64 Cn15 Nl36 No13
230 Arrows b2190 l112 Sm27 So85
231 Mathematical Operators b2200 l256 Sm256
232 Miscellaneous Technical b2300 l256 Cn47 Ps2 Pe2 Po1 Sm32 So172
233 Control Pictures b2400 l64 Cn25 So39
234 Optical Character Recognition b2440 l32 Cn21 So11
235 Enclosed Alphanumerics b2460 l160 No82 So78
236 Box Drawing b2500 l128 So128
237 Block Elements b2580 l32 So32
238 Geometric Shapes b25A0 l96 Sm10 So86
239 Miscellaneous Symbols b2600 l256 Cn111 Sm1 So144
240 Dingbats b2700 l192 Cn18 No30 Ps7 Pe7 So130
241 Miscellaneous Mathematical Symbols-A b27C0 l48 Cn20 Ps3 Pe3 Sm22
242 Supplemental Arrows-A b27F0 l16 Sm16
243 Braille Patterns b2800 l256 So256
244 Supplemental Arrows-B b2900 l128 Sm128
245 Miscellaneous Mathematical Symbols-B b2980 l128 Ps14 Pe14 Sm100
246 Supplemental Mathematical Operators b2A00 l256 Sm256
247 Miscellaneous Symbols and Arrows b2B00 l256 Cn242 So14
248 CJK Radicals Supplement b2E80 l128 Cn13 So115
249 Kangxi Radicals b2F00 l224 Cn10 So214
250 Ideographic Description Characters b2FF0 l16 Cn4 So12
251 CJK Symbols and Punctuation b3000 l64 Lm7 Lo2 Mn6 Nl13 Zs1 Pd2 Ps10 Pe11 Po4 So8
252 Hiragana b3040 l96 Cn3 Lm2 Lo87 Mn2 Sk2
253 Katakana b30A0 l96 Lm3 Lo91 Pc1 Pd1
254 Bopomofo b3100 l48 Cn8 Lo40
255 Hangul Compatibility Jamo b3130 l96 Cn2 Lo94
256 Kanbun b3190 l16 No4 So12
257 Bopomofo Extended b31A0 l32 Cn8 Lo24
258 Katakana Phonetic Extensions b31F0 l16 Lo16
259 Enclosed CJK Letters and Months b3200 l256 Cn15 No50 So191
260 CJK Compatibility b3300 l256 So256
261 CJK Unified Ideographs Extension A b3400 l6592 Cn10 Lo6582
262 Yijing Hexagram Symbols b4DC0 l64 So64
263 CJK Unified Ideographs b4E00 l20992 Cn90 Lo20902
264 Yi Syllables bA000 l1168 Cn3 Lo1165
265 Yi Radicals bA490 l64 Cn9 So55
266 Hangul Syllables bAC00 l11184 Cn12 Lo11172
267 High Surrogates bD800 l896 Cs896
268 High Private Use Surrogates bDB80 l128 Cs128
269 Low Surrogates bDC00 l1024 Cs1024
270 Private Use Area bE000 l6400 Co6400
271 CJK Compatibility Ideographs bF900 l512 Cn151 Lo361
272 Alphabetic Presentation Forms bFB00 l80 Cn22 Ll12 Lo44 Mn1 Sm1
273 Arabic Presentation Forms-A bFB50 l688 Cn93 Lo591 Ps1 Pe1 Sc1 So1
274 Variation Selectors bFE00 l16 Mn16
275 Combining Half Marks bFE20 l16 Cn12 Mn4
276 CJK Compatibility Forms bFE30 l32 Pc5 Pd2 Ps9 Pe9 Po7
277 Small Form Variants bFE50 l32 Cn6 Pd2 Ps3 Pe3 Po13 Sm4 Sc1
278 Arabic Presentation Forms-B bFE70 l144 Cn3 Lo140 Cf1
279 Halfwidth and Fullwidth Forms bFF00 l240 Cn15 Lu26 Ll26 Lm3 Lo107 Nd10 Pc2 Pd1 Ps5 Pe5 Po17 Sm11 Sc5 Sk3 So4
280 Specials bFFF0 l16 Cn11 Cf3 So2
281 Linear B Syllabary b10000 l128 Cn40 Lo88
282 Linear B Ideograms b10080 l128 Cn5 Lo123
283 Aegean Numbers b10100 l64 Cn7 No45 Po2 So10
284 Old Italic b10300 l48 Cn13 Lo31 No4
285 Gothic b10330 l32 Cn5 Lo26 Nl1
286 Ugaritic b10380 l32 Cn1 Lo30 Po1
287 Deseret b10400 l80 Lu40 Ll40
288 Shavian b10450 l48 Lo48
289 Osmanya b10480 l48 Cn8 Lo30 Nd10
290 Cypriot Syllabary b10800 l64 Cn9 Lo55
291 Byzantine Musical Symbols b1D000 l256 Cn10 So246
292 Musical Symbols b1D100 l256 Cn37 Mn22 Mc8 Cf8 So181
293 Tai Xuan Jing Symbols b1D300 l96 Cn9 So87
294 Mathematical Alphanumeric Symbols b1D400 l1024 Cn32 Lu443 Ll489 Nd50 Sm10
295 CJK Unified Ideographs Extension B b20000 l42720 Cn9 Lo42711
296 CJK Compatibility Ideographs Supplement b2F800 l544 Cn2 Lo542
297 Tags bE0000 l128 Cn31 Cf97
298 Variation Selectors Supplement bE0100 l240 Mn240
299 Supplementary Private Use Area-A bF0000 l65536 Cn2 Co65534
300 Supplementary Private Use Area-B b100000 l65536 Cn2 Co65534
301 $
302
303

  ViewVC Help
Powered by ViewVC 1.1.26