java正则表达式从较大的字符串中排除特定的字符串

发布于 2024-08-20 06:16:11 字数 401 浏览 8 评论 0原文

一段时间以来,我一直在努力反对这一点: 我想捕获所有 [az]+[0-9]? 字符序列,不包括 sin|cos|tan 等字符串。 因此,完成我的正则表达式作业后,以下正则表达式应该可以工作:

(?:(?!(sin|cos|tan)))\b[a-z]+[0-9]?

如您所见,我使用负向前查找和交替 - 非捕获组右括号后面的 \b 对于避免匹配 in of sin 等。正则表达式是有意义的,事实上,我已经尝试使用 RegexBuddy 和 Java 作为目标实现并获得想要的结果,但它不起作用使用 Java Matcher 和 Pattern 对象! 有什么想法吗?

干杯

I have been banging my head against this for some time now:
I want to capture all [a-z]+[0-9]? character sequences excluding strings such as sin|cos|tan etc.
So having done my regex homework the following regex should work:

(?:(?!(sin|cos|tan)))\b[a-z]+[0-9]?

As you see I am using negative lookahead along with alternation - the \b after the non-capturing group closing parenthesis is critical to avoid matching the in of sin etc. The regex makes sense and as a matter of fact I have tried it with RegexBuddy and Java as the target implementation and get the wanted result but it doesn't work using Java Matcher and Pattern objects!
Any thoughts?

cheers

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

爱给你人给你 2024-08-27 06:16:11

\b 位于错误的位置。它将寻找一个之前没有 sin/cos/tan 的字边界。但是,任何一个之后的边界都会在末尾有一个字母,因此它必须是词尾边界,如果下一个字符是 az,则不能是词尾边界。

另外,负向预测(如果有效)会排除像 cost 这样的字符串,如果您只是过滤掉关键字,我不确定您是否想要这样的字符串。

我建议:

\b(?!sin\b|cos\b|tan\b)[a-z]+[0-9]?\b

或者,更简单地说,您可以只匹配 \b[az]+[0-9]?\b ,然后过滤掉关键字列表中的字符串。您并不总是需要在正则表达式中完成所有操作。

The \b is in the wrong place. It would be looking for a word boundary that didn't have sin/cos/tan before it. But a boundary just after any of those would have a letter at the end, so it would have to be an end-of-word boundary, which is can't be if the next character is a-z.

Also, the negative lookahead would (if it worked) exclude strings like cost, which I'm not sure you want if you're just filtering out keywords.

I suggest:

\b(?!sin\b|cos\b|tan\b)[a-z]+[0-9]?\b

Or, more simply, you could just match \b[a-z]+[0-9]?\b and filter out the strings in the keyword list afterwards. You don't always have to do everything in regex.

梨涡少年 2024-08-27 06:16:11

所以你想要 [az]+[0-9]? (至少一个字母的序列,可选地后跟一个数字),除非该字母序列类似于以下之一sin cos tan

\b(?!(sin|cos|tan)(?=\d|\b))[a-z]+\d?\b

结果:

cos   - no match
cosy  - full match
cos1  - no match
cosy1 - full match
bla9  - full match
bla99 - no match

So you want [a-z]+[0-9]? (a sequence of at least one letter, optionally followed by a digit), unless that letter sequence resembles one of sin cos tan?

\b(?!(sin|cos|tan)(?=\d|\b))[a-z]+\d?\b

results:

cos   - no match
cosy  - full match
cos1  - no match
cosy1 - full match
bla9  - full match
bla99 - no match
北恋 2024-08-27 06:16:11

我忘记转义 java 的 \b ,所以 \b 应该是 \\b 并且现在可以工作了。
干杯

i forgot to escape the \b for java so \b should be \\b and it now works.
cheers

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文