Emacs 23 使用比 Unicode 大四倍的字符集 - 为什么?
从 Emacs 23.1 NEWS 开始:
*** Emacs 字符集现在是 Unicode 的超集。 (它有大约 四倍的代码空间,其中 应该足够了)。
稍后会有更多详细信息:
*** 在多字节缓冲区和字符串中,字符由 UTF-8 表示 字节序列。字符代码 空间现在是 0x0..0x3FFFFF,没有 差距;代码点 0x0..0x10FFFF 是 相同代码的 Unicode 字符 点,而代码点 0x3FFF80..0x3FFFFF 是原始 8 位 字节。
根据维基百科,BMP 的 UCS 有 65536 个字符,最新版本的 Unicode 包含超过 107000 个字符,UCS 拥有超过一百万个代码点。 0x3FFFFF 超过四百万。
可以解决哪些问题,或者拥有 Unicode 超集的内部字符集有何好处?
From Emacs 23.1 NEWS:
*** The Emacs character set is now a superset of Unicode. (It has about
four times the code space, which
should be plenty).
And more details later on:
*** In multibyte buffers and strings, characters are represented by UTF-8
byte sequences. The character code
space is now 0x0..0x3FFFFF with no
gap; code points 0x0..0x10FFFF are
Unicode characters of the same code
points, while code points
0x3FFF80..0x3FFFFF are raw 8-bit
bytes.
According to Wikipedia, the BMP of the UCS has 65536 characters, the latest version of Unicode contains more than 107000 characters, and the UCS has more than one million code points. 0x3FFFFF is more than four millions.
What problems could be solved or how otherwise it is beneficial to have internal character set that is a superset of Unicode?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
Unicode 旨在包含所有人类语言所需的字符集,这对于代码的全球化/本地化当然很有用,但由于 Emacs 是众神的工具,它还必须包含神可能使用的每个字符各种类型(包括但不限于旧日支配者的可怕符文)、太空种族(包括但不限于我们未来的外星霸主)、超智能机器智能(包括但不限于我们未来的机器人主人) )以及所有其他渴望无限宇宙力量的存在。这可能是很多角色!
或者这可能与 UTF-8 作为一种字符编码方式有关,它比 Unicode 集占用的空间大得多,而 Emacs 只支持整个 UTF-8,但我更喜欢上面的解释。
Unicode is designed to encompass the required character sets for all human languages, which is certainly useful for globalisation/localisation of your code, but because Emacs is the tool of the gods themselves, it has to also encompass every character that may be used by deities of all kinds ( including but not limited to the eldritch runes of the Great Old Ones), spacefaring races ( including but not limited to our future alien overlords ), ultra-intelligent-machine-intelligences ( including but not limited to our future robot masters ) and every other being that desires infinite cosmic power. That is potentially a whole lot of characters!
Or it could be to do with UTF-8 being a way of encoding characters that has much more space than is taken up by the Unicode set and Emacs just supporting the whole of UTF-8, but I prefer my explanation above.