如何在 javascript 的正则表达式中使用 unicode 字符组?

发布于 2024-12-27 23:59:41 字数 115 浏览 2 评论 0 原文

有没有一种方法可以在 javascript 中使用像“\p{L}”这样的模式?

(我想这是一个与 perl 兼容的语法)

我首先对 firefox 支持和 webkit 感兴趣,可能

there is a way to use patterns like "\p{L}" in javascript, natively?

(i suppose that is a perl-compatible syntax)

I'm interested firstly in firefox support, and webkit, possibly

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

恏ㄋ傷疤忘ㄋ疼 2025-01-03 23:59:41

不,任何大型浏览器本身都不支持 \p{..}。但是,如果您使用 XRegExp 库 并且它是 Unicode 插件

No, \p{..} is not supported natively by any of the big browsers. However, it does work in JavaScript if you use the XRegExp library and it's Unicode plugins.

谁的新欢旧爱 2025-01-03 23:59:41

不幸的是,没有。您只能用通常的语法指定一组字符,在括号中写入字符和范围,但这会变得很尴尬,因为例如字母分散在 Unicode 空间的各处,而它们之间还夹杂着其他字符。

有一个低效的解决方法:从 Unicode 站点获取 UnicodeData.txt 文件,将其内容作为数据放入 JavaScript 代码中,然后解析它。然后您可以将数据保存在包含 Unicode 属性的对象数组中,例如 gc(常规类别),它告诉您该字符是否是字母。但即便如此,您也只能将数据方便地进行简单测试,而不是用作正则表达式的组成部分。

理论上,您可以使用数据来构造正则表达式......但它会相当大。

Unfortunately, no. You can only specify a set of characters in the usual syntax, writing characters and ranges in brackets, but this becomes awkward since e.g. letters are scattered all around the Unicode space, with other characters between them.

There’s an inefficient workaround: fetch the UnicodeData.txt file from the Unicode site, put its content inside your JavaScript code as data, and parse it. And then you could have the data e.g. in an array of objects containing the Unicode properties, such as gc (General Category), which tells you whether the character is a letter or not. But even then, you would just have the data handy for simple testing, not as something you can use as a constituent of a regexp.

In theory, you could use the data to construct a regexp... but it would be rather large.

煞人兵器 2025-01-03 23:59:41

不,Javascript 的语法略有不同。要捕获 unicode,您必须使用字符选择器,例如 \uXXXX。但是,在实践中,如果您的页面和文件采用 UTF-8,则在 [абвг] 范围内设置非 ASCII 字符也可以。

http://www.javascriptkit.com/jsref/regexp.shtml

No, Javascript has slightly different syntax. To catch unicode you have to use character selector like \uXXXX. However, on practice if your page and files in UTF-8, setting non-ASCII characters in range [абвг] does work too.

http://www.javascriptkit.com/jsref/regexp.shtml

允世 2025-01-03 23:59:41

在这里找到的库:

http://inimino.org/~inimino/blog/javascript_cset

似乎为我工作,它相当小并且独立于其他图书馆。

The library found here:

http://inimino.org/~inimino/blog/javascript_cset

seems to work for me and is fairly small and independent of other libraries.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文