为什么我们需要 UCS 和 Unicode 字符集?
我猜 UCS 和 Unicode 的代码点是相同的,对吗?
既然如此,为什么我们需要两个标准(UCS 和 Unicode)?
I guess the codepoints of UCS and Unicode are the same, am I right?
In that case, why do we need two standards (UCS and Unicode)?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
它们不是两个标准。通用字符集 (UCS) 不是标准,而是标准中定义的内容,即 ISO 10646。这不应与编码(例如 UCS-2)混淆。
很难猜测您实际上是指不同的编码还是不同的标准。但对于后者,Unicode 和 ISO 10646 原本是两个不同的标准化工作,具有不同的目标和策略。然而,它们在 20 世纪 90 年代初进行了协调,以避免两种不同标准造成的混乱。它们已经过协调,因此代码点确实相同。
不过,它们保持不同,部分原因是 Unicode 是由一个可以灵活工作的行业联盟定义的,并且对标准化简单代码点分配之外的事物非常感兴趣。 Unicode标准定义了大量的原理和处理规则,而不仅仅是字符。 ISO 10646 是一个正式标准,可以在 ISO 及其成员的标准和其他文件中引用。
They are not two standards. The Universal Character Set (UCS) is not a standard but something defined in a standard, namely ISO 10646. This should not be confused with encodings, such as UCS-2.
It is difficult to guess whether you actually mean different encodings or different standards. But regarding the latter, Unicode and ISO 10646 were originally two distinct standardization efforts with different goals and strategies. They were however harmonized in the early 1990s to avoid all the mess resulting from two different standards. They have been coordinated so that the code points are indeed the same.
They were kept distinct, though, partly because Unicode is defined by an industry consortium that can work flexibly and has great interest in standardizing things beyond simple code point assignments. The Unicode Standard defines a large number of principles and processing rules, not just the characters. ISO 10646 is a formal standard that can be referenced in standards and other documents of the ISO and its members.
代码点相同,但存在一些差异。
来自有关 Unicode 和 ISO 10646(即 UCS)之间差异的维基百科条目:
您可能会发现阅读绝对最低限度 每个软件开发人员绝对必须了解 Unicode 和字符集(没有借口!)
我认为差异来自于代码点的编码方式。 UCS-x 使用固定数量的字节来编码代码点。例如,UCS-2 使用两个字节。但是,UCS-2 无法对需要超过 2 个字节的代码点进行编码。另一方面,UTF 使用可变数量的字节进行编码。例如,UTF-8 至少使用一个字节(对于 ascii 字符),但如果字符超出 ascii 范围,则使用更多字节。
The codepoints are the same but there are some differences.
From the Wikipedia entry about the differences between Unicode and ISO 10646 (i.e. UCS):
You might find useful to read the Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)
I think the differences come from the way the code points are encoded. UCS-x uses a fixed amount of bytes to encode a code point. For example, UCS-2 uses two bytes. However, UCS-2 cannot encode code points that would require over 2 bytes. On the other hand, UTF uses variable amount of bytes for encoding. For example, UTF-8 uses at least one byte (for ascii characters) but uses more bytes if the character is outside the ascii range.