Groovy Regex 用于匹配单词（即使带有重音字母）

发布于 2024-10-26 04:31:08 字数 337 浏览 11 评论 0原文

我正在尝试对任何文本中的单词进行标记，例如：

Ça me plaît.

应标记为“ça，me，plaît”。为此，我想清除字符串中的所有特殊字符，然后将其拆分为空格。通过这段代码：

text = text.toLowerCase().replaceAll(/^\w/, ' ')
def tokens = text.split(" ")

我得到了

a me pla t

Which 远没有用处。我在这里需要什么正则表达式？

谢谢！穆隆

原文

I'm trying to tokenize words from any text, e.g.:

Ça me plaît.

Should be tokenized as "ça,me,plaît".
To do this, I want to clear the string from all special characters, and then split it on a whitespace. With this code:

text = text.toLowerCase().replaceAll(/^\w/, ' ')
def tokens = text.split(" ")

I get

a me pla t

Which is far from being useful.
What regex do I need here?

Thanks!
Mulone

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

心舞飞扬 2024-11-02 04:31:13

您可以使用 \S（大写 S）代替 \w。 \S 匹配所有非白色字符，而 \s（非大写）匹配所有白色字符。

因此，你将拥有

text = text.toLowerCase().replaceAll(/^\S/, ' ')
def tokens = text.split(" ")

You could use \S (capital S) instead of \w. \S matches all non-white characters, while \s (non-capital) matches all white characters.

Hence, you'll have

text = text.toLowerCase().replaceAll(/^\S/, ' ')
def tokens = text.split(" ")

回复收藏 0 原文

关于从前 2024-11-02 04:31:12

这似乎对我有用（至少对于这种情况）：

'Ça me plaît.'.toLowerCase().replaceAll( /[^\p{javaLowerCase}]/, ' ').split( ' ' )

This seems to work for me (at least for this situation):

'Ça me plaît.'.toLowerCase().replaceAll( /[^\p{javaLowerCase}]/, ' ').split( ' ' )

回复收藏 0 原文

~没有更多了~

关于作者

此生挚爱伱

暂无简介

文章

27 人气

关注发私信

alipaysp_snBf0MSZIv

文章 0 评论 0

关注

梦断已成空

文章 0 评论 0

关注

瞎闹

文章 0 评论 0

关注

凯凯我们等你回来

文章 0 评论 0

关注

寄意

文章 0 评论 0

关注

似梦非梦

文章 0 评论 0

友情链接

文江博客

Groovy Regex 用于匹配单词（即使带有重音字母）

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

alipaysp_snBf0MSZIv

梦断已成空

瞎闹

凯凯我们等你回来

寄意

似梦非梦

友情链接

Groovy Regex 用于匹配单词（即使带有重音字母）

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

alipaysp_snBf0MSZIv

梦断已成空

瞎闹

凯凯我们等你回来

寄意

似梦非梦

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。