Unicode 字母字符列表
我需要具有 Alphabetic
属性的 Unicode 字符范围列表,如 http://www.unicode.org/Public/5.1.0/ucd/UCD.html#Alphabetic。但是,无论我如何搜索,我都无法在 Unicode 字符数据库中找到它们。有人可以提供它们的列表或仅提供具有指定 Unicode 属性的字符的搜索工具吗?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
Unicode 字符数据库包含发行版中的所有文本文件。它不再像很久以前那样只是一个文件。
Alphabetic 属性是派生属性。
您确实不想为此使用代码点范围。您希望正确使用该属性。那是因为它们的数量太多了。使用 unichars 脚本,我们得知有超过一万个仅在基本多语言位面不计算韩文或韩文:
如果我们包括其他 16 个星体层,现在我们有 14000 个:
如果我们包括韩文和韩文,这实际上是字母表属性确实如此,我们刚刚炸毁了十万个代码点:
我希望您能看到您不想想要使用代码点范围专门枚举这些代码点。沿着这条路走下去就是疯狂。
顺便说一句,如果您发现 unichars 脚本有用,
您可能还喜欢 uniprops 脚本,也许还有 uninames 脚本。
The Unicode Character Database comprises all the text files in the distribution. It is not just a single file as it once was long ago.
The Alphabetic property is a derived property.
You really do not want to use code point ranges for this. You want to use the property properly. That’s because there are just too many of them. Using the unichars script, we learn that there are more than ten thousand just in the Basic Multilingual Plane alone not counting Han or Hangul:
If we include the other 16 astral planes, now we’re at fourteen thousand:
And if we include Han and Hangul, which in fact the Alphabetic property does, we just blew the roof off of a hundred thousands code points:
I hope you can see that you do not want to specifically enumerate these using code point ranges. Down that road lies madness.
By the way, if you find the unichars script useful,
you might also like the uniprops script and perhaps the uninames script.
派生核心属性可以根据其他属性计算得出。
Alphabetic 属性定义为: 生成自: Lu + Ll + Lt + Lm + Lo + Nl + Other_Alphabetic
因此,如果您采用 Lu、Ll、Lt、Lm、Lo、Nl 中的所有字符以及带有Other_Alphabetic 属性,您将拥有字母字符。
Derived Core Properties can be calculated from the other properties.
The Alphabetic property is defined as: Generated from: Lu + Ll + Lt + Lm + Lo + Nl + Other_Alphabetic
So, if you take all the characters in Lu, Ll, Lt, Lm, Lo, Nl, and all the characters with the Other_Alphabetic property, you will have the Alphabetic characters.
来自您来源的引用:
生成自:Lu + Ll + Lt + Lm + Lo + Nl + Other_Alphabetic
这些缩写似乎得到了解释此处。
Citation from your source:
Generated from: Lu + Ll + Lt + Lm + Lo + Nl + Other_Alphabetic
These Abbrevations seem to be explained here.
我发现 UniView Web 应用程序提供了一个很好的搜索界面。搜索 Letter 属性(未选中 Local)会给出 14723 个结果...
I found the UniView web application which provides a nice search interface. Searching for the Letter property (with Local unchecked) gives 14723 results...