如何在 ColdFusion 或 Java 正则表达式中匹配拉丁 unicode 字符?
我正在寻找一个 ColdFusion 或 Java 正则表达式(在替换函数中使用),它仅匹配数字 [0-9]、字母 [az],但不包含 ASCII 葡萄牙语 字母(unicode latin ,如 ç
和 ã
)。
有些像这样:
str = reReplaceNoCase(str, "match none number/letter but keep unicode latin chars", "", "ALL");
输入字符串:“informação 123 ?:#$%”
期望的结果:“informação 123”
我知道我可以使用[az][0-9]
匹配字母和数字,但这与之类的字母不匹配>ç
和 ã
。
I'm looking for a ColdFusion or Java regex (to use in a replace function) that will only match numbers [0-9], letters [a-z], but include none ASCII Portuguese letters (unicode latin, like ç
and ã
).
Some like this:
str = reReplaceNoCase(str, "match none number/letter but keep unicode latin chars", "", "ALL");
Input string: "informação 123 ?:#$%"
Desired outcome: "informação 123"
I know I can match letters and numbers with [a-z][0-9]
, but this doesn't match letters such as ç
and ã
.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
尝试字母数字字符类:
\w
,它应该匹配字母、数字和下划线。您也可以使用特殊的命名类
\p{L}
(我不知道,Java RegEx 解析器是否支持它)。因此,在 C# 中,您可以使用以下代码完成任务:
Regex
[^\p{L}\s0-9]
表示:此类中不的任何字符(所有字母、空格、数字)。因此它与您的示例?:#$%
匹配,我们可以用空字符串替换这些字符。Try alphanumeric character class:
\w
, it should match letters, digits, and underscores.Also you can use special named class
\p{L}
(I don't know, does Java RegEx parser support it).So in C# your task can be done using following code:
Regex
[^\p{L}\s0-9]
means: any character not in this class (all letters, white space, digits). Thereby it matches in your example?:#$%
and we can replace these characters with empty string.