当前位置：文江博客话题详情

Unicode character-properties alphabetic

Unicode 字母字符列表

发布于 2024-10-15 08:53:55 字数 288 浏览 11 评论 0 原文

我需要具有 Alphabetic 属性的 Unicode 字符范围列表，如 http://www.unicode.org/Public/5.1.0/ucd/UCD.html#Alphabetic。但是，无论我如何搜索，我都无法在 Unicode 字符数据库中找到它们。有人可以提供它们的列表或仅提供具有指定 Unicode 属性的字符的搜索工具吗？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

蓝天白云 2024-10-22 08:53:55

Unicode 字符数据库包含发行版中的所有文本文件。它不再像很久以前那样只是一个文件。

Alphabetic 属性是派生属性。

您确实不想为此使用代码点范围。您希望正确使用该属性。那是因为它们的数量太多了。使用 unichars 脚本，我们得知有超过一万个仅在基本多语言位面不计算韩文或韩文：

$ unichars '\p{Alphabetic}' | wc -l
   10052

如果我们包括其他 16 个星体层，现在我们有 14000 个：

$ unichars -a '\p{Alphabetic}' | wc -l
   14736

如果我们包括韩文和韩文，这实际上是字母表属性确实如此，我们刚刚炸毁了十万个代码点：

$ unichars -ua '\p{Alphabetic}' | wc -l
  101539

我希望您能看到您不想想要使用代码点范围专门枚举这些代码点。沿着这条路走下去就是疯狂。

顺便说一句，如果您发现 unichars 脚本有用，
您可能还喜欢 uniprops 脚本，也许还有 uninames 脚本。

The Unicode Character Database comprises all the text files in the distribution. It is not just a single file as it once was long ago.

The Alphabetic property is a derived property.

You really do not want to use code point ranges for this. You want to use the property properly. That’s because there are just too many of them. Using the unichars script, we learn that there are more than ten thousand just in the Basic Multilingual Plane alone not counting Han or Hangul:

$ unichars '\p{Alphabetic}' | wc -l
   10052

If we include the other 16 astral planes, now we’re at fourteen thousand:

$ unichars -a '\p{Alphabetic}' | wc -l
   14736

And if we include Han and Hangul, which in fact the Alphabetic property does, we just blew the roof off of a hundred thousands code points:

$ unichars -ua '\p{Alphabetic}' | wc -l
  101539

I hope you can see that you do not want to specifically enumerate these using code point ranges. Down that road lies madness.

By the way, if you find the unichars script useful,
you might also like the uniprops script and perhaps the uninames script.

回复收藏 0 原文

墨离汐 2024-10-22 08:53:55

派生核心属性可以根据其他属性计算得出。

Alphabetic 属性定义为：生成自： Lu + Ll + Lt + Lm + Lo + Nl + Other_Alphabetic

因此，如果您采用 Lu、Ll、Lt、Lm、Lo、Nl 中的所有字符以及带有Other_Alphabetic 属性，您将拥有字母字符。

回复收藏 0 原文

也只是曾经 2024-10-22 08:53:55

来自您来源的引用：生成自：Lu + Ll + Lt + Lm + Lo + Nl + Other_Alphabetic

这些缩写似乎得到了解释此处。

回复收藏 0 原文

木格 2024-10-22 08:53:55

我发现 UniView Web 应用程序提供了一个很好的搜索界面。搜索 Letter 属性（未选中 Local）会给出 14723 个结果...

回复收藏 0 原文

~没有更多了~

关于作者

作妖

暂无简介

文章

27 人气

关注发私信

友情链接

文江博客

Unicode 字母字符列表

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（4）

关于作者

相关话题

热门标签

推荐作者

5040234068

樱花雨梦

≈。彩虹

雨轻弹

血之狂魔

qq_0bIjwE

友情链接

Unicode 字母字符列表

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（4）

关于作者

相关话题

热门标签

推荐作者

5040234068

樱花雨梦

≈。彩虹

雨轻弹

血之狂魔

qq_0bIjwE

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。