检测单个 CJK 字符
我有一个字符串,可以是英文单词或单个 CJK 字符。我保证该字符串采用 UTF-8 编码。我正在 perl 脚本内工作。
更高级别的问题是我有一个像上面描述的那样的字符串数组。我正在做一个连接“”@array。我想知道中日韩语时不要添加空格。
所以对于 CJK,我只会加入“”@array。
我环顾四周但找不到这个确切的问题。
谢谢。
I have a string that could be either an english word or a single CJK character. I am guaranteed that this string is in UTF-8 encoding. I am working inside of a perl script.
The higher level problem is I have an array of strings like the one described above. I am doing a join " " @array. I want to know to not add the space when its CJK.
So for CJK I will just do join "" @array.
I have looked around but can't find this exact question.
Thanks.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
您可以使用正则表达式
\p{InCJK_Unified_Ideographs}
。这是一个 Unicode 块(与 Perl 也支持的 Unicode 脚本相反,但似乎与您的问题描述不匹配)。还有一些其他候选块,例如扩展 A 和激进补充。 这是完整列表。
You could use the regular expression
\p{InCJK_Unified_Ideographs}
. This is a Unicode Block (as opposed to Unicode Scripts, which are also supported by Perl, but don't seem to match your problem description).There are some other candidate blocks, like the Extension A and the Radicals Supplement. Here's a full list.