正则表达式(特定单词除外)
我的正则表达式有问题。 我需要制作正则表达式,但一组指定的单词除外,例如:苹果、橙子、果汁。 给出这些单词,它将匹配除上面单词之外的所有单词。
applejuice (match)
yummyjuice (match)
yummy-apple-juice (match)
orangeapplejuice (match)
orange-apple-juice (match)
apple-orange-aple (match)
juice-juice-juice (match)
orange-juice (match)
apple (should not match)
orange (should not match)
juice (should not match)
I have problem with regex.
I need to make regex with an exception of a set of specified words, for example: apple, orange, juice.
and given these words, it will match everything except those words above.
applejuice (match)
yummyjuice (match)
yummy-apple-juice (match)
orangeapplejuice (match)
orange-apple-juice (match)
apple-orange-aple (match)
juice-juice-juice (match)
orange-juice (match)
apple (should not match)
orange (should not match)
juice (should not match)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
如果您确实想使用单个正则表达式来执行此操作,您可能会发现环视很有帮助(尤其是本示例中的负向前瞻)。为 Ruby 编写的正则表达式(某些实现具有不同的环视语法):
If you really want to do this with a single regular expression, you might find lookaround helpful (especially negative lookahead in this example). Regex written for Ruby (some implementations have different syntax for lookarounds):
我注意到
apple-juice
应该根据您的参数进行匹配,但是apple-juice
呢?我假设如果您正在验证苹果汁
,您仍然希望它失败。因此,让我们构建一组算作“边界”的字符:
在大多数正则表达式风格中
\b
算作“单词边界”,但“单词字符”的标准列表不包括 < code>- 因此您需要创建一个自定义的。如果您不想捕获-
,它也可以与/\b(apple|orange|juice)\b/
匹配...如果您只是测试“单个单词”测试你可以使用更简单的方法:
I noticed that
apple-juice
should match according to your parameters, but what aboutapple juice
? I'm assuming that if you are validatingapple juice
you still want it to fail.So - lets build a set of characters that count as a "boundary":
In most regexp flavors
\b
counts as a "word boundary" but the standard list of "word characters" doesn't include-
so you need to create a custom one. It could match with/\b(apple|orange|juice)\b/
if you weren't trying to catch-
as well...If you are only testing 'single word' tests you can go with a much simpler:
这得到了一些方法:
This gets some of the way there:
将匹配整个字符串,除非它仅包含禁用单词之一。
或者,如果您不使用 Ruby,或者您确定您的字符串不包含换行符,或者您设置了
^
和$
在开头不匹配的选项/ 行尾也可以。
will match an entire string unless it only consists of one of the forbidden words.
Alternatively, if you're not using Ruby or you're sure that your strings contain no line breaks or you have set the option that
^
and$
do not match on beginnings/ends of lineswill also work.
下面是一些简单的复制粘贴代码,它们不仅仅适用于精确单词异常。
复制/粘贴代码:
在以下正则表达式中,仅将全部大写部分替换为您的正则表达式。
Python regex
Ruby regex
PCRE regex
JavaScript
截至 2020 年 6 月 17 日,这是不可能的,并且在不久的将来可能也不会实现。
完整示例
REGEX_BEFORE = \b
YOUR_NORMAL_PATTERN = \w+
REGEX_AFTER =
EXCEPTION_PATTERN = (苹果|橙子|果汁)
Python regex
Ruby regex
PCRE regex
它是如何工作的?
它使用相当复杂的正则表达式,即原子组、条件、后向查找和命名组。
(?>
是原子组的开始,这意味着它不允许回溯:这意味着,如果该组匹配一次,但随后由于回溯失败而失效,那么整个组将无法匹配。(在这种情况下我们希望这种行为)。请注意,原子模式首先尝试查找异常,如果找不到异常,则返回到正常模式。
请注意,
真正的魔力在于
(?(exceptions_group_1)
。这是一个条件询问exceptions_group_1是否成功匹配。如果是,那么它会尝试查找always(? <=fail)
。该模式(正如它所说的)总是会失败,因为它寻找单词“always”,然后检查“does“ways”==“fail””,而它永远不会失败。 .因为条件失败,这意味着原子组失败,并且因为它是原子的,这意味着它不允许回溯(尝试寻找正常模式),因为它已经匹配了异常。
这绝对不是这些工具的预期用途,但它应该可靠且高效地工作。
Ruby 中原始问题的准确答案
与其他方法不同,可以修改此方法以拒绝任何模式,例如不包含子字符串“apple”、“orange”或“juice”的任何单词。
Here's some easy copy-paste code that works for more than just exact-words exceptions.
Copy/Paste Code:
In the following regex, ONLY replace the all-caps sections with your regex.
Python regex
Ruby regex
PCRE regex
JavaScript
Impossible as of 6/17/2020, and probably won't be possible in the near future.
Full Examples
REGEX_BEFORE = \b
YOUR_NORMAL_PATTERN = \w+
REGEX_AFTER =
EXCEPTION_PATTERN = (apple|orange|juice)
Python regex
Ruby regex
PCRE regex
How does it work?
This uses decently complicated regex, namely Atomic Groups, Conditionals, Lookbehinds, and Named Groups.
The
(?>
is the start of an atomic group, which means its not allowed to backtrack: which means, If that group matches once, but then later gets invalidated because a lookbehind failed, then the whole group will fail to match. (We want this behavior in this case).The
(?<exceptions_group_1>
creates a named capture group. Its just easier than using numbers. Note that the pattern first tries to find the exception, and then falls back on the normal pattern if it couldn't find the exception.Note that the atomic pattern first tries to find the exception, and then falls back on the normal pattern if it couldn't find the exception.
The real magic is in the
(?(exceptions_group_1)
. This is a conditional asking whether or not exceptions_group_1 was successfully matched. If it was, then it tries to findalways(?<=fail)
. That pattern (as it says) will always fail, because its looking for the word "always" and then it checks 'does "ways"=="fail"', which it never will.Because the conditional fails, this means the atomic group fails, and because it's atomic that means its not allowed to backtrack (to try to look for the normal pattern) because it already matched the exception.
This is definitely not how these tools were intended to be used, but it should work reliably and efficiently.
Exact answer to the original question in Ruby
Unlike other methods, this one can be modified to reject any pattern such as any word not containing the sub-string "apple","orange", or "juice".
类似(PHP)的东西
Something like (PHP)