我在哪里可以获得不同范围不同的Unicode的示例?
我还添加了另一个Unicode归一化问题,因为我花了很多时间去寻找并且找不到我需要的东西。我的情况是,我需要将Unicode归一化以检查字符串是否等效,但我不明白选择不同的正常形式的后果。我想做的是获得一些示例有效的 Unicode输入,该输入的范围有所不同,以便我可以使用不同的选项来玩耍,但是我不知道如何制作或在哪里找到它。 这个答案有一些示例数据,但这些示例专注于畸形或无效的Unicode Strings(我认为我不认为?知道我在看什么)。我需要一组用户期望等效的字符串,接口将视为有效,并且在归一化之前不等。假设UTF-8是具体的,但我会感谢多个编码的示例。我正在与Python合作,如果答案取决于实施,但我想其他人可能会喜欢不限于Python的答案。
在哪里可以获得在某些正常形式而不是其他形式下等效的Unicode字符串,最好证明所有正常化如何不同?
I'm adding yet another unicode normalization question because I've spent quite a bit of time looking and can't find what I need. I have a situation where I need to normalize unicode to check if strings are equivalent, but I don't understand the consequences of choosing different normal forms. What I would like to do is get some example valid unicode input that normalizes differently so I can play around with the different options, but I don't know how to make it or where I could find it. This answer has some example data but the examples are focused on malformed or invalid unicode strings (I think? Maybe I don't know what I'm looking at). I need a set of strings users will expect to be equivalent, an interface will accept as valid, and that are not equal until normalized. Let's say UTF-8 to be specific but I'd appreciate examples for multiple encodings. I'm working with python if there are answers that depend on implementation, but I imagine others might appreciate answers that are not limited to python.
Where can I get example unicode strings that are equivalent under some normal forms and not others, preferably demonstrating how all the normalizations differ?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
https://unicode.org/reports/tr15/#Norm_Forms 有很多例子,以及围绕它们的大量解释。
https://unicode.org/reports/tr15/#Norm_Forms has a good number of examples, and a significant amount of explanations around them.