Unicode 字母字符列表

发布于 2024-10-15 08:53:55 字数 288 浏览 2 评论 0 原文

我需要具有 Alphabetic 属性的 Unicode 字符范围列表,如 http://www.unicode.org/Public/5.1.0/ucd/UCD.html#Alphabetic。但是,无论我如何搜索,我都无法在 Unicode 字符数据库中找到它们。有人可以提供它们的列表或仅提供具有指定 Unicode 属性的字符的搜索工具吗?

I need the list of ranges of Unicode characters with the property Alphabetic as defined in http://www.unicode.org/Public/5.1.0/ucd/UCD.html#Alphabetic. However, I cannot find them in the Unicode Character Database no matter how I search for them. Can somebody provide a list of them or just a search facility for characters with specified Unicode properties?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

蓝天白云 2024-10-22 08:53:55

Unicode 字符数据库包含发行版中的所有文本文件。它不再像很久以前那样只是一个文件。

Alphabetic 属性是派生属性。

您确实不想为此使用代码点范围。您希望正确使用该属性。那是因为它们的数量太多了。使用 unichars 脚本,我们得知有超过一万个仅在基本多语言位面计算韩文或韩文:

$ unichars '\p{Alphabetic}' | wc -l
   10052

如果我们包括其他 16 个星体层,现在我们有 14000 个:

$ unichars -a '\p{Alphabetic}' | wc -l
   14736

如果我们包括韩文和韩文,这实际上是字母表属性确实如此,我们刚刚炸毁了十万个代码点:

$ unichars -ua '\p{Alphabetic}' | wc -l
  101539

我希望您能看到您不想想要使用代码点范围专门枚举这些代码点。沿着这条路走下去就是疯狂。

顺便说一句,如果您发现 unichars 脚本有用,
您可能还喜欢 uniprops 脚本,也许还有 uninames 脚本

The Unicode Character Database comprises all the text files in the distribution. It is not just a single file as it once was long ago.

The Alphabetic property is a derived property.

You really do not want to use code point ranges for this. You want to use the property properly. That’s because there are just too many of them. Using the unichars script, we learn that there are more than ten thousand just in the Basic Multilingual Plane alone not counting Han or Hangul:

$ unichars '\p{Alphabetic}' | wc -l
   10052

If we include the other 16 astral planes, now we’re at fourteen thousand:

$ unichars -a '\p{Alphabetic}' | wc -l
   14736

And if we include Han and Hangul, which in fact the Alphabetic property does, we just blew the roof off of a hundred thousands code points:

$ unichars -ua '\p{Alphabetic}' | wc -l
  101539

I hope you can see that you do not want to specifically enumerate these using code point ranges. Down that road lies madness.

By the way, if you find the unichars script useful,
you might also like the uniprops script and perhaps the uninames script.

墨离汐 2024-10-22 08:53:55

派生核心属性可以根据其他属性计算得出。

Alphabetic 属性定义为: 生成自: Lu + Ll + Lt + Lm + Lo + Nl + Other_Alphabetic

因此,如果您采用 Lu、Ll、Lt、Lm、Lo、Nl 中的所有字符以及带有Other_Alphabetic 属性,您将拥有字母字符。

Derived Core Properties can be calculated from the other properties.

The Alphabetic property is defined as: Generated from: Lu + Ll + Lt + Lm + Lo + Nl + Other_Alphabetic

So, if you take all the characters in Lu, Ll, Lt, Lm, Lo, Nl, and all the characters with the Other_Alphabetic property, you will have the Alphabetic characters.

也只是曾经 2024-10-22 08:53:55

来自您来源的引用:生成自:Lu + Ll + Lt + Lm + Lo + Nl + Other_Alphabetic

这些缩写似乎得到了解释此处

Citation from your source: Generated from: Lu + Ll + Lt + Lm + Lo + Nl + Other_Alphabetic

These Abbrevations seem to be explained here.

木格 2024-10-22 08:53:55

我发现 UniView Web 应用程序提供了一个很好的搜索界面。搜索 Letter 属性(未选中 Local)会给出 14723 个结果...

I found the UniView web application which provides a nice search interface. Searching for the Letter property (with Local unchecked) gives 14723 results...

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文