哪种 Unicode 规范化形式更好?
我在 Dreamweaver 上有四个选项:C、D、KC、KD。我应该选择哪一个?为什么?
I have four options on Dreamweaver: C, D, KC, KD. Which one should I choose and why?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
为了什么?保存文件,使用 NFC,因为 Web 字符模型使用它(严格来说,W3C 标准化坚持要求流采用 NFC,并且当 HTML 或 XML 中的实体转换为它们表示的字符时,它仍然采用 NFC )。尽管它可以阻止一些相当晦涩的问题让某些人感到不安,但它产生实际影响的可能性很小。
标准化使得某些等效序列产生相同的流。例如,U+0065 (e) 后跟 U+0301(组合锐音符)本身相当于 U+00E9 (é)。
NFD 将所有此类字符串拆分为其组成部分(例如,将 U+00E9 转换为 U+0065,然后转换为 U+0301)。如果一行中有两个或多个组合字符,则根据提供一致性的规则对它们重新排序(ḉ 可以有变音号后跟变音号,或者变音号后跟变音号,我们需要一致的排序才能有产生相同的字符串)。大多数情况下,NFD 对于作为另一项任务一部分的内部处理非常有用,例如剥离重音或生成 NFC。
NFC 从 NFD 开始,然后在可能的情况下再次将字符组合在一起,排除一些例外,以确保一个 Unicode 版本的规范化字符串与另一个版本保持一致。
NFKD 在相互替换某些相似字符方面比 NFD 更进一步。例如,⁵ 被替换为 5。这会“损坏”文本(用户可以出于充分的理由合理地选择 ⁵ 而不是 5),但对于搜索很有用(在 google 上搜索“fiſh”,它会返回“fish”的结果,因为它将 long-s 视为与 Short-s 相同),并在某些情况下作为限制,以避免类似但不同字符的安全问题。 NFKC首先做NFKD,然后以与NFC相同的方式组合。
http://unicode.org/reports/tr15/ 为全瘦,并“使用 NFC但别担心”重复简短的回答。
For what? Saving a file, use NFC as the web character model uses it (strictly, the W3C normalisation insists that both the stream be in NFC and also that when entities in HTML or XML are converted to the characters they represent, that it is still in NFC). The odds that it'll ever make a practical difference are slim, though it could stop a few rather obscure issues upsetting someone down the line.
Normalisation makes certain equivalent sequences result in identical streams. For example, U+0065 (e) followed by U+0301 (a combining acute accent) is equivalent to U+00E9 (é) on its own.
NFD splits all such strings up into their component parts (e.g. turning U+00E9 into U+0065 followed by U+0301). If there are two or more combining characters in a row, they are re-ordered according to rules that give a consistency (ḉ could have the cedilla followed by the accute or the accute followed by the cedilla, and we need a consistent ordering to have the same string produced). Mostly NFD is useful for internal processing as part of another task, such as stripping accents, or producing NFC.
NFC starts with NFD and then combines the characters together again where possible, barring a few exceptions to ensure that what was a normalised string with one version of Unicode remains so with another.
NFKD goes further than NFD in replacing certain similar characters with each other. ⁵ for example is replaced with 5. This "damages" the text (a user may reasonably choose ⁵ over 5 for a good reason) but is useful for searching (search for "fiſh" on google and it returns results for "fish" because it treats the long-s the same as a short-s) and as a restriction in certain cases to avoid security issues with similar but different characters. NFKC first does NFKD and then combines in the same manner as NFC.
http://unicode.org/reports/tr15/ for the full skinny, and "use NFC but don't worry about it" to repeat the short answer.