鱼泡是鱼的什么器官| 泡果酒用什么酒好| 吃什么对皮肤好| 口嫌体正直是什么意思| 产妇适合吃什么水果| 舌苔黄厚吃什么药| 亲临是什么意思| 脑梗吃什么药可以恢复的快| 结婚婚检都检查什么项目| 养神经的药是什么药最好| 格林巴利综合症是什么| 为什么月经前乳房胀痛| 骨髓纤维化是什么病| 腰疼挂什么科室| 囊中羞涩什么意思| 对食是什么意思| 生闷气容易得什么病| 什么鱼清蒸最好吃| 关节响是什么原因| 咽炎挂什么科室| 舌头咬破了用什么药| 手指甲紫色是什么原因| 心率快吃什么药| 供血不足吃什么药效果最好| 感冒吃什么消炎药效果好| 喝生鸡蛋有什么好处| 芸豆长什么样子| b型阳性血是什么意思| 什么补肝| 软件开发属于什么行业| 宛如是什么意思| hyundai是什么牌子| 为什么说| 太安鱼是什么鱼| 胰尾显示不清什么意思| 验孕棒一深一浅是什么意思| prp是什么| 什么的叶丛| 男孩叫什么名字| 黄鼻涕吃什么药| 为什么不能近亲结婚| 超声波是什么原理| 滴虫性阴道炎用什么药效果最好| 细胞是由什么构成的| 蜜蜡和琥珀有什么区别| 夏令时什么意思| 什么是活珠子| 右下腹疼是什么原因| 白醋泡脚有什么好处| 医学美容技术学什么| 做穿刺是什么意思| 什么是速率| 口干舌燥是什么原因引起的| sakose是什么牌子| 痱子粉什么牌子好| siri是什么| 动则气喘是什么原因| dan是什么单位| 斑马吃什么| 3.9是什么星座| 北京市长是什么级别| 十余年是什么意思| 什么样的人死后还会出现| ua是什么单位| 1月7号是什么星座| 小孩脚底脱皮是什么原因造成的| 吃黄瓜有什么好处| 什么人容易得甲亢| 骨质硬化是什么意思| 白酒不能和什么一起吃| 油菜花是什么季节开的| 最短的季节是什么| 上官是什么意思| 酱油色尿是什么原因| 萎缩性胃炎吃什么药效果好| 王莲是什么植物| 肌酐高是什么意思| 为什么会得脑梗| 小肚子一直疼是什么原因| 什么食物去湿气效果好| 马后面是什么生肖| 长期打嗝是什么原因| 肠道感染是什么原因引起的| 下眼皮跳动是什么原因| 血压和血糖有什么关系| 衬衫什么面料好| 胰腺炎吃什么药见效快| 什么值得买怎么用| 港式按摩是什么意思| 喝脱脂牛奶有什么好处| 梭织面料是什么面料| 阿奇霉素主治什么病| 夏天适合用什么护肤品| 塔罗是什么意思| 支气管炎吃什么药最有效| 坐飞机要带什么证件| 海南简称是什么| 后背长痘是什么原因| 法院庭长是什么级别| 士官是什么级别| 男人不尊重你说明什么| 氮肥是什么肥料| 中位数什么意思| 烤鱼什么鱼好吃| 检察院是干什么的| 为什么会有狐臭| 面部痒是什么原因| 冬至有什么忌讳| 厉兵秣马什么意思| 性取向是什么| 我想成为一个什么样的人| 隐血试验阴性是什么意思| 倒着走路有什么好处| 为什么拉肚子| 数不胜数的胜是什么意思| 11月8日是什么星座| 重阳节应该吃什么| 胃寒湿气重吃什么药效果最好| 上海元宵节吃什么| 孕妇可以用什么护肤品| 眼睛经常充血是什么原因引起的| 创伤性湿肺是什么意思| 利涉大川是什么意思| 控制线是什么意思| 功能性子宫出血是什么原因造成的| 甲亢去医院挂什么科| 一九七八年属什么生肖| 三元及第是什么意思| 长期干咳无痰是什么原因引起的| 中午12点到1点是什么时辰| 什么牌子充电宝好| 2007属什么生肖| 梦见苍蝇是什么预兆| 甘油三酯高会引起什么病| seiko手表是什么牌子| 爱出者爱返福往者福来什么意思| 吃了布洛芬不能吃什么| 生肖蛇五行属什么| 今年属于什么生肖| 下午3点到5点是什么时辰| 老年人生日送什么礼物| 犒劳自己是什么意思| 海东青是什么| 月经期能吃什么水果| 灵魂是什么| 海绵体充血不足吃什么药| 钠是什么| 烀是什么意思| 早泄什么症状| 总是感觉口渴是什么原因| phonics是什么意思| 跳蚤是什么样的图片| 凌波仙子是什么意思| 泡酒用什么酒好| 梦到鬼是什么意思| 头大脸大适合什么发型| 三伏天喝什么汤| 古代女子成年行什么礼| 天灾人祸什么意思| 梦见买车是什么意思| cu什么意思| c12是什么| 头发为什么会掉| 血管堵塞用什么药| 药玉是什么| 乙肝表面抗原阳性是什么意思| 慢性子宫颈炎是什么意思| 三伏天晒背有什么好处| 密度灶是什么意思| 94年属什么| 和尚代表什么生肖| 脚底板痒是什么原因| 补办港澳通行证需要什么材料| 静脉曲张是什么原因引起的| coa是什么| 补钾用什么药| 梦见自己杀人了是什么意思| 屁股大什么原因| 直肠腺瘤是什么| 老实人为什么总被欺负| 8.1是什么星座| 述求是什么意思| 夏天适合吃什么菜| 白茶什么样的好| 人生有什么意义| 疤痕增生是什么引起的| 什么好| 经常困想睡觉是什么问题| 蛇用什么呼吸| 左肖是什么生肖| 悻悻然是什么意思| 豪爽是什么意思| 1981年什么命| 阴道口出血是什么原因| 砸是什么意思| 织物是什么材质| 晚上七点半是什么时辰| 中性粒细胞低说明什么| 低血压是什么意思| 腻歪什么意思| 母女丼什么意思| 为什么微信运动总是显示步数为0| 治疗白头发挂什么科| 名什么什么实| 贺喜是什么意思| 貔貅什么人不能戴| 公主抱是什么意思| 下海什么意思| 眼睑痉挛是什么原因造成的| 安保是什么工作| 省军区司令员是什么级别| 梦见借给别人钱是什么意思| 下巴老是长痘痘是什么原因| 一只脚心疼是什么原因| 孕妇耻骨疼是什么原因| 九寨沟什么时候去最好| 花旗参和西洋参有什么区别| bata鞋属于什么档次| 为什么月经迟迟不来又没怀孕| 湿寒吃什么中成药| 11月16日是什么星座| 湿疹长什么样| 张家界莓茶有什么功效| 挑担是什么意思| 上山下水什么字| 辰砂和朱砂有什么区别| 甲减挂什么科| 喝酒会得什么病| 什么时候浇花最好| 2型糖尿病是什么意思| 缄默症是什么病| 褪黑素什么时候吃| 掉头发多是什么原因| 脑供血不足有什么危害| 鱼鳞病是什么| 社交恐惧是什么| 蜘蛛结网预示着什么| 小丫头是什么意思| 郁郁寡欢什么意思| 梦见小猪仔什么意思| 虾仁配什么蔬菜包饺子| 香皂和肥皂有什么区别| 95年属什么生肖婚配表| 一直打嗝不止是什么原因| 姨妈推迟是什么原因| 皮肤黄适合穿什么颜色的衣服| 法兰克穆勒什么档次| 世子是什么意思| 桉字五行属什么| 田童念什么| 女性性高潮是什么感觉| s是什么m是什么| 花名册是什么意思| 小孩什么时候说话| 什么是无纺布| 什么最解酒最快| 什么是靶向疗法| 6.29是什么星座| 甘油三酯偏高是什么原因| 梨状肌综合征挂什么科| 十余年是什么意思| 胆红素偏高是什么意思| 秋黄瓜什么时候种| 肝不好吃什么调理| 百度

小米6高清渲染图曝光:变焦双摄、四曲面陶瓷/玻璃

百度 同时她提到,中国研发不仅是为中国市场,还是为全球市场研发产品。

Tamil All Character Encoding (TACE16) is a scheme for encoding the Tamil script in the Private Use Area of Unicode, implementing a syllabary-based character model differing from the modified-ISCII model used by Unicode's existing Tamil implementation.[1][2]

Keyboard drivers and fonts

edit

The keyboard driver for this encoding scheme is available on the Tamil Virtual Academy website for free.[3][4] It uses Tamil 99 and Tamil Typewriter keyboard layouts, which are approved by the Government of Tamil Nadu, and maps the input keystrokes to its corresponding characters of the TACE16 scheme.[2] To read files created using TACE16, the corresponding Unicode Tamil fonts are also available on the same website.[3][4] These fonts map glyphs for characters of TACE16 format, but also for the Unicode block for both ASCII and Tamil characters, so that they can provide backward compatibility for reading existing files which are created using the Tamil Unicode block.

Character set

edit

All the characters of this encoding scheme are located in the private use area of the Basic Multilingual Plane of Unicode's Universal Coded Character Set.

Tamil All Character Encoding (TACE16) Character Set[5]
Vowels→ ? A ā I ī U ū E ē Ai O ō Au (Miscellaneous)
Consonants
_0 _1 _2 _3 _4 _5 _6 _7 _8 _9 _A _B _C _D _E _F
(Symbols) U+E10_ ? ? ? ? ? ? ? ? ??? ?
(Numbers) U+E18_ ? ? ? ? ? ? ? ? ? ? ? ? ?
(Fractions) U+E1A_ ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ??
? U+E1F_ ? ? ? ? ? ? ? ? ? ? ? ?
? U+E20_ ? ? ? ? ? ? ? ? ? ? ? ? ?
K U+E21_ ?? ? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ??
Ng U+E22_ ?? ? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ??
C U+E23_ ?? ? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ??
? U+E24_ ?? ? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ??
? U+E25_ ?? ? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ??
? U+E26_ ?? ? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ??
T U+E27_ ?? ? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ??
N U+E28_ ?? ? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ??
P U+E29_ ?? ? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ??
M U+E2A_ ?? ? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ??
Y U+E2B_ ?? ? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ??
R U+E2C_ ?? ? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ??
L U+E2D_ ?? ? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ??
V U+E2E_ ?? ? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ??
? U+E2F_ ?? ? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ??
? U+E30_ ?? ? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ??
? U+E31_ ?? ? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ??
? U+E32_ ?? ? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ??
Grantha characters
J U+E33_ ?? ? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ??
Sh U+E34_ ?? ? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ??
? U+E35_ ?? ? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ??
S U+E36_ ?? ? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ??
H U+E37_ ?? ? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ??
K? U+E38_ ???? ??? ???? ???? ???? ???? ???? ???? ???? ???? ???? ???? ???? ????
Legend:
Syllabograms with irregular glyphs, which inherently need to be handled individually by a font.[a]
Newly added. Not present in Unicode version 6.3.
Corresponds to a character in the Tamil Supplement block, added in Unicode version 12 (2019)
Allocated for research (NLP)

Comparison of TACE16 to present Tamil Unicode

edit

Criticism of the standard Unicode character model for Tamil

edit
 
Unicode's encoding models for Devanagari, Tamil, Kannada, Sinhala and emoji require use of the invisible zero-width joiner and zero-width non-joiner characters.

The existing Unicode character model for Tamil is, like most of Indic Unicode,[b] an abugida-based model derived from ISCII. It been criticized for several reasons.[1]

Unicode represents only 31 Tamil base characters as single code points, out of 247 grapheme clusters. These include stand-alone vowels, and 23 basic consonant glyphs (which, due to not bearing a virama, nonetheless denote a syllable with both a consonant and a vowel when used on their own). The others are represented as sequences of code points, requiring software support for advanced typography features (such as Apple Advanced Typography, Graphite, or OpenType advanced typography) to render correctly. This also requires the use of invisible zero-width joiner and zero-width non-joiner characters in places where the desired grapheme cluster would otherwise be ambiguous. This complexity can result in security vulnerabilities and ambiguous combinations, can require the use of an exception table to forbid invalid combinations of code points, and can necessitate the use of string normalization to compare two strings for equality.

Additionally, since syllables with both a consonant and a vowel form 64 to 70% of Tamil text, an abugida-based model which encodes the consonant and vowel parts as separate code points is inefficient, in terms of how long a string needs to be to contain a given piece of text, in comparison with a syllabary-based model.

Furthermore, ISCII is primarily an encoding of Devanagari, and the ISCII encodings of other Brahmic scripts (including Tamil) encode characters over the code points of the corresponding characters in Devanagari ISCII. Although Unicode encodes the Brahmic scripts separately from one another, the Tamil block mirrors the ISCII layout (with Devanagari-style character ordering, and reserved space in positions corresponding to Devanagari characters with no Tamil equivalent); consequently, the characters are not in the natural sequence order, and strings collated by code point (analogous to "ASCIIbetical" sorting of English text) will not produce the expected sorting order. It requires a complex collation algorithm for arranging them in the natural order.

TACE16 in comparison

edit

The following data provides a comparison of current Unicode Tamil vs. TACE16 on e-governance and browsing:[1][better source needed]

  • TACE16 is efficient over Unicode Tamil by about 5.46 to 11.94 percent for data storage[clarification needed].
  • TACE16 is efficient over Unicode Tamil by about 18.69 to 22.99 percent for sorting index data.
  • TACE16 is efficient over Unicode Tamil by about 25.39% when the entire data is Tamil. The default collation sequence followed (binary) while using the code-space values in TACE16 is not as per Tamil dictionary order.
  • TACE16 is faster in sorting over Unicode Tamil by about 0.31 to 16.96 percent.
  • Index creation on TACE16 data is faster by 36.7% than Unicode.
  • For full key search on indexed fields, TACE16 performs better than Unicode Tamil by up to 24.07%. In the case of non-indexed fields, TACE16 performs better than Unicode Tamil by up to 20.9%.
  • Rendering of static Tamil data works with TACE16.

TACE16 provides performance improvements in processing time and processing space. It encompasses all of the general Tamil text; it is sequential; and it is unambiguous, with any point corresponding to only one character.[1][better source needed] The TACE16 system takes fewer instruction cycles than Unicode Tamil, and also allows programming based on Tamil grammar[clarification needed], which needs extra framework development in Unicode Tamil.

Responses by the Unicode Consortium

edit

The Unicode Consortium publishes a dedicated FAQ page on the Tamil script which responds to some of the criticisms. In defence of the ISCII model, the Consortium notes that expert linguists, typographers and programmers were involved in its development, but acknowledges that compromises were made due to ISCII being constrained to single-byte extended ASCII. The Consortium points out that Unicode Tamil is now implemented by all major operating systems and web browsers, and maintains that it should be used in open interchange contexts, such as online, since tools such as search engines would not necessarily be able to identify or interpret a sequence of Unicode private-use code points as Tamil text. However, the Consortium does not object to the use of Private-Use Area schemes, including TACE16, internally to particular processes for which they are useful. In particular, it highlights that both markup schemes and alternative encoding schemes may be used by researchers for specialised purposes such as natural-language processing.[6]

Unicode defines normative named-sequences for all Tamil pure consonants and syllables which are represented with sequences of more than one code point, and a dedicated table is published as part of the Unicode Standard listing all of these sequences, in their traditional order, along with their correct glyphs. The Consortium points out that it has been open to accepting proposals for characters for which no existing Unicode representation exists: for example, adding several historical fractions and other symbols as the Tamil Supplement block in version 12.0 in 2019.[6]

Regarding collation, the Consortium argues that obtaining the correct result from sorting by code point is the exception rather than the rule, highlighting that, in unmodified ASCIIbetical ordering, the uppercase Latin letter Z sorts before the lowercase letter a, and also highlighting that collation rules often differ by language (see e.g. ?). Regarding space efficiency, the Consortium argues that storage space and bandwidth taken up by text is usually far overshadowed by other accompanying media such as images and video, and that text content performs well under general-purpose compression methods such as Deflate (originally from the ZIP file format, standardized in RFC 1951 and integrated in the HTTP protocol as a generic encoding scheme).[6]

Unicode Stability Policy

edit

When first published (version 1.0.0), Unicode made only limited stability guarantees. As such, the original Tibetan block was deleted in version 1.0.1 (and its space has since been occupied by the Myanmar block), and the original block for Korean syllables was deleted in version 2.0 (and is now occupied by CJK Unified Ideographs Extension A). Both the current Hangul Syllables block for Korean syllables, and the current Tibetan block, date back to Unicode 2.0. This was done on the assumption that little or no existing content using Unicode for those writing systems existed,[7] since it would break compatibility with all existing Unicode content in, and input methods for, those writing systems. After this so-dubbed "Korean mess", the responsible committees pledged not to make such a compatibility-breaking change ever again,[7] which now forms part of the Unicode Stability Policy.[8]

This stability policy has been upheld ever since, in spite of demands to re-encode or change the character model for both Tibetan and Korean a second time, made by China and North Korea respectively.[9][10][11][12] Likewise in relation to Tamil, the Consortium emphasises the "crucial issue of maintaining the stability of the standard for existing implementations", and argues that "the resulting costs and impact of destabilizing the standard" would substantially outweigh any efficiency benefits in processing speed or storage space.[6]

There was a proposal to re-encode Tamil[13] that was rejected by Unicode, who said that the re-encoding would be damaging and that there was no convincing evidence that Unicode Tamil encoding is deficient.[14]

Alternatives

edit

Open-Tamil

edit

The Open-Tamil project[15] provides many of the common operations. It claims Level-1 compliance of Tamil text processing without using TACE16, but is written on top of extra programming logic which is needed for Unicode Tamil.

See also

edit

Footnotes

edit
  1. ^ Highlighted syllabograms in the U and ū columns are those where the vowel portion of the glyph matches neither the simple subjoining forms shown for those combining vowel marks in the Unicode block chart, nor the right-joining Grantha forms (as used for those combining vowel marks in isolation by, for example, Noto fonts).
  2. ^ Except for Tibetan, which uses a different model, and for Thai and related scripts, which use a model derived from TIS-620.

References

edit
  1. ^ a b c d REPORT ON THE FINAL RECOMMENDATIONS OF THE TASK FORCE ON TACE16 (PDF) (Report).
  2. ^ a b "TENDER DOCUMENT for Development of Tamil Fonts and Tamil Keyboard driver for 16-bit encodings (Unicode and TACE16)" (PDF). Tamil Virtual Academy.
  3. ^ a b "????? ??????????????". ????? ?????? ???????????? TAMIL VIRTUAL ACADEMY.
  4. ^ a b Tamil Nadu Government's Order(G.O.), Keyboard Drivers and Fonts Archived 27 December 2023 at archive.today
  5. ^ Tamil Virtual Academy. "Annexure 4: Typewriter Extended Keyboard Sequence for Unicode and TACE16" (PDF). Tender Document for Development of Tamil Fonts and Tamil Keyboard driver for 16-bit encodings (Unicode and TACE16). Chennai.
  6. ^ a b c d "FAQ - Tamil Language and Script". Unicode Consortium.
  7. ^ a b Yergeau, F. (1998). UTF-8, a transformation format of ISO 10646. IETF. doi:10.17487/rfc2279. RFC 2279.
  8. ^ "Unicode Character Encoding Stability Policies". Unicode Consortium.
  9. ^ West, Andrew (2025-08-06). "Precomposed Tibetan Part 1 : BrdaRten". BabelStone.
  10. ^ China National Body (2025-08-06). "China's Statement of BrdaRten ad hoc". ISO/IEC JTC1/SC2/WG2 N2674.
  11. ^ Karlsson, Kent (2025-08-06). "Comments on DPRK New Work Item proposal on Korean characters". ISO/IEC JTC1/SC2/WG2 N2167.
  12. ^ Cho, Chun-Hui (2025-08-06). "DPRK letter on character names and ordering in 10646-1: 2000" (PDF). ISO/IEC JTC1/SC2/WG2 N2231.
  13. ^ Anantham, A.R.Amaithi (2025-08-06). "Fresh Encoding Proposals" (PDF). Unicode.
  14. ^ "Archive of Notices of Non-Approval". Unicode. 2025-08-06.
  15. ^ Annamalai, M.; Arulalan, T., Open-Tamil: Tamil language text processing tools for Python v3, retrieved 2025-08-06
右肺纤维灶是什么意思 痰湿阻滞吃什么中成药 尿隐血阳性是什么病 窦性心律不齐有什么危害 什么颜色的包包招财并聚财
胃胀气吃什么药好 桓是什么意思 姑姑和我是什么关系 冒昧是什么意思 疙瘩是什么意思
诺氟沙星胶囊治什么病 咳嗽不能吃什么食物 绿茶女是什么意思 狂躁症吃什么药 胃阳虚吃什么中成药
七月十五是什么节 笄礼是什么意思 毒灵芝长什么样 借鸡生蛋是什么意思 拜阿司匹林什么时间吃最好
什么是附件炎hcv9jop1ns3r.cn 古尔邦节是什么意思hcv8jop9ns3r.cn 5月30日是什么星座hcv9jop5ns3r.cn 小孩小腿疼是什么原因引起的hcv8jop2ns7r.cn 核心抗体阳性说明什么hcv7jop6ns6r.cn
什么是肺纤维化hcv8jop5ns5r.cn item什么意思fenrenren.com 主管护师是什么职称hcv7jop9ns8r.cn 不畏将来不念过往什么意思hcv8jop1ns1r.cn 鹦鹉吃什么东西hcv9jop2ns7r.cn
中国姓什么的人最多hcv9jop5ns7r.cn 心脏瓣膜关闭不全吃什么药hcv7jop6ns6r.cn 梦见前夫是什么意思hcv9jop0ns4r.cn 佳的五行属什么hcv9jop2ns6r.cn 感冒后咳嗽吃什么药hcv8jop4ns1r.cn
什么不平chuanglingweilai.com 常流鼻血是什么原因sscsqa.com 钢琴是什么乐器种类hcv8jop8ns8r.cn 颜面扫地什么意思hcv8jop5ns6r.cn 月经第二天是什么期hcv8jop4ns6r.cn
百度