在 Ruby 中枚举字符的 Unicode 属性?
有什么方法可以枚举 Ruby 中字符的所有 Unicode 属性吗?我可以使用 Ruby 1.9 的 Regexp 类来测试给定字符是否具有特定属性(例如,some_char =~ /\p{P}/
来测试 some_char
是标点符号等)...但是由于字符可以有多个属性((
,例如,既是标点符号 又是 ASCII 等),那就太好了只是能够得到一个 我可能可以使用 unicode_data.txt
或任何名称来手动完成此操作,但这似乎已经在某个地方完成了。 UnicodeUtils 似乎没有任何类似的东西,谷歌搜索也没有发现任何明显的东西,谢谢!
Is there any way to enumerate all of a character's Unicode properties in Ruby? I can use Ruby 1.9's Regexp class to test whether a given character has a particular property (e.g., some_char =~ /\p{P}/
to test whether some_char
is punctuation, etc.)... but since characters can have multiple properties ((
, for example, is both punctuation and ASCII, etc.), it would be nice to just be able to get a list of all of a character's properties.
I could probably do this by hand using unicode_data.txt
, or whatever it's called, but this seems like the sort of thing that's probably already been done somewhere. UnicodeUtils
doesn't appear to have anything along these lines, and Googling didn't turn up anything obvious. Thanks!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
您可以调用我的 uniprops 脚本。
您可能还想获得 unichars 这样您就可以走另一条路。以下只是调用它的示例:
这是输出的一个示例:
等等。
我在我的 OSCON Unicode 中描述了这些会谈。这些只是一套几十个工具中的两个。
You can call out to my uniprops script.
You probably want to also get unichars so you can go the other way. Here are just the examples of calling it:
Here is one example of the output:
etc.
I describe these the first of my OSCON Unicode talks. Those are just two of the tools in a suite of a couple of dozen of them.
runpaint 有一个 unicode_data.txt 接口,它运行良好,但将自己描述为“非常早期草案”。
There is a unicode_data.txt interface by runpaint, which works well, but describes itself as a "very early draft".