如何获取具有给定属性的所有 Unicode 字符的列表?
如果不循环整个 Unicode 字符范围,如何获取具有给定属性的字符列表? 特别是我想要一个所有数字字符的列表(即那些匹配 /\d/
的字符)。 我查看了 Unicode::UCD
,它是对于确定给定字符的属性很有用,但似乎没有办法获取具有属性的列表字符。
Without looping over the entire range of Unicode characters, how can I get a list of characters that have a given property? In particular I want a list of all characters that are digits (i.e. those that match /\d/
). I have looked at Unicode::UCD
, and it is useful for determining the properties of a given character, but there doesn't seem to be a way to get a list characters that have a property out of it.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
每个类的 Unicode 字符列表是在编译 Perl 时从 Unicode 规范生成的,通常存储在 /usr/lib/perl-YOURPERLVERSION/unicore/lib/gc_sc/
例如,匹配的 Unicode 字符范围列表IsDigit(又名 \d)存储在文件 /usr/lib/perl-YOURPERLVERSION/unicore/lib/gc_sc/Digit.pl 中
The list of Unicode characters for each class is generated from the Unicode spec when you compile Perl, and is typically stored in /usr/lib/perl-YOURPERLVERSION/unicore/lib/gc_sc/
For example, the list of Unicode character ranges that match IsDigit (a.k.a. \d) is stored in the file /usr/lib/perl-YOURPERLVERSION/unicore/lib/gc_sc/Digit.pl
unicore/To/Digit.pl
甚至比unicore/lib/gc_sc/Digit.pl
更好。 它是 Unicode 数字字符(实际上是它们的偏移量)到它们的数值的直接映射。 这意味着我可以说:
甚至更好:
Even better than
unicore/lib/gc_sc/Digit.pl
isunicore/To/Digit.pl
. It is a direct mapping of Unicode digit characters (well, really their offsets) to their numeric values. This means instead of:I can say:
Or even better:
/\d/ 匹配哪些字符完全取决于您的正则表达式实现(尽管保证标准 0-9)。 对于 Perl,使用的 perl 语言环境 定义哪些字符被视为字母和数字。
which characters /\d/ match depends entirely on your regexp implementation (although standard 0-9 are guaranteed). In the case of perl the perl locale used defines which characters are considered alphabetic and digits.
如果不迭代所有字符,就无法做到这一点。
(如果您使用所有这些字符串创建一个巨大的字符串并使用正则表达式,您仍然必须至少执行一次循环才能创建字符串)。
There is no way to do that without iterating through all the characters.
(if you create a huge string with all of them and use a regexp you still have to do the loop at least once, to create the string).