正则表达式(特定单词除外)

发布于 2024-08-13 01:30:32 字数 370 浏览 3 评论 0原文

我的正则表达式有问题。 我需要制作正则表达式,但一组指定的单词除外,例如:苹果、橙子、果汁。 给出这些单词,它将匹配除上面单词之外的所有单词。

applejuice (match)
yummyjuice (match)
yummy-apple-juice (match)
orangeapplejuice (match)
orange-apple-juice (match)
apple-orange-aple (match)
juice-juice-juice (match)
orange-juice (match)

apple (should not match)
orange (should not match)
juice (should not match)

I have problem with regex.
I need to make regex with an exception of a set of specified words, for example: apple, orange, juice.
and given these words, it will match everything except those words above.

applejuice (match)
yummyjuice (match)
yummy-apple-juice (match)
orangeapplejuice (match)
orange-apple-juice (match)
apple-orange-aple (match)
juice-juice-juice (match)
orange-juice (match)

apple (should not match)
orange (should not match)
juice (should not match)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

情话难免假 2024-08-20 01:30:32

如果您确实想使用单个正则表达式来执行此操作,您可能会发现环视很有帮助(尤其是本示例中的负向前瞻)。为 Ruby 编写的正则表达式(某些实现具有不同的环视语法):

rx = /^(?!apple$|orange$|juice$)/

If you really want to do this with a single regular expression, you might find lookaround helpful (especially negative lookahead in this example). Regex written for Ruby (some implementations have different syntax for lookarounds):

rx = /^(?!apple$|orange$|juice$)/
一身仙ぐ女味 2024-08-20 01:30:32

我注意到 apple-juice 应该根据您的参数进行匹配,但是 apple-juice 呢?我假设如果您正在验证苹果汁,您仍然希望它失败。

因此,让我们构建一组算作“边界”的字符:

/[^-a-z0-9A-Z_]/        // Will match any character that is <NOT> - _ or 
                        // between a-z 0-9 A-Z 

/(?:^|[^-a-z0-9A-Z_])/  // Matches the beginning of the string, or one of those 
                        // non-word characters.

/(?:[^-a-z0-9A-Z_]|$)/  // Matches a non-word or the end of string

/(?:^|[^-a-z0-9A-Z_])(apple|orange|juice)(?:[^-a-z0-9A-Z_]|$)/ 
   // This should >match< apple/orange/juice ONLY when not preceded/followed by another
   // 'non-word' character just negate the result of the test to obtain your desired
   // result.

在大多数正则表达式风格中 \b 算作“单词边界”,但“单词字符”的标准列表不包括 < code>- 因此您需要创建一个自定义的。如果您不想捕获 - ,它也可以与 /\b(apple|orange|juice)\b/ 匹配...

如果您只是测试“单个单词”测试你可以使用更简单的方法:

/^(apple|orange|juice)$/ // and take the negation of this...

I noticed that apple-juice should match according to your parameters, but what about apple juice? I'm assuming that if you are validating apple juice you still want it to fail.

So - lets build a set of characters that count as a "boundary":

/[^-a-z0-9A-Z_]/        // Will match any character that is <NOT> - _ or 
                        // between a-z 0-9 A-Z 

/(?:^|[^-a-z0-9A-Z_])/  // Matches the beginning of the string, or one of those 
                        // non-word characters.

/(?:[^-a-z0-9A-Z_]|$)/  // Matches a non-word or the end of string

/(?:^|[^-a-z0-9A-Z_])(apple|orange|juice)(?:[^-a-z0-9A-Z_]|$)/ 
   // This should >match< apple/orange/juice ONLY when not preceded/followed by another
   // 'non-word' character just negate the result of the test to obtain your desired
   // result.

In most regexp flavors \b counts as a "word boundary" but the standard list of "word characters" doesn't include - so you need to create a custom one. It could match with /\b(apple|orange|juice)\b/ if you weren't trying to catch - as well...

If you are only testing 'single word' tests you can go with a much simpler:

/^(apple|orange|juice)$/ // and take the negation of this...
南街九尾狐 2024-08-20 01:30:32

这得到了一些方法:

((?:apple|orange|juice)\S)|(\S(?:apple|orange|juice))|(\S(?:apple|orange|juice)\S)

This gets some of the way there:

((?:apple|orange|juice)\S)|(\S(?:apple|orange|juice))|(\S(?:apple|orange|juice)\S)
一世旳自豪 2024-08-20 01:30:32
\A(?!apple\Z|juice\Z|orange\Z).*\Z

将匹配整个字符串,除非它仅包含禁用单词之一。

或者,如果您不使用 Ruby,或者您确定您的字符串不包含换行符,或者您设置了 ^$ 在开头不匹配的选项/ 行尾

^(?!apple$|juice$|orange$).*$

也可以。

\A(?!apple\Z|juice\Z|orange\Z).*\Z

will match an entire string unless it only consists of one of the forbidden words.

Alternatively, if you're not using Ruby or you're sure that your strings contain no line breaks or you have set the option that ^ and $ do not match on beginnings/ends of lines

^(?!apple$|juice$|orange$).*$

will also work.

相对绾红妆 2024-08-20 01:30:32

下面是一些简单的复制粘贴代码,它们不仅仅适用于精确单词异常。

复制/粘贴代码:

在以下正则表达式中,仅将全部大写部分替换为您的正则表达式。

Python regex

pattern = r"REGEX_BEFORE(?>(?P<exceptions_group_1>EXCEPTION_PATTERN)|YOUR_NORMAL_PATTERN)(?(exceptions_group_1)always(?<=fail)|)REGEX_AFTER"

Ruby regex

pattern = /REGEX_BEFORE(?>(?<exceptions_group_1>EXCEPTION_PATTERN)|YOUR_NORMAL_PATTERN)(?(<exceptions_group_1>)always(?<=fail)|)REGEX_AFTER/

PCRE regex

REGEX_BEFORE(?>(?<exceptions_group_1>EXCEPTION_PATTERN)|YOUR_NORMAL_PATTERN)(?(exceptions_group_1)always(?<=fail)|)REGEX_AFTER

JavaScript

截至 2020 年 6 月 17 日,这是不可能的,并且在不久的将来可能也不会实现。

完整示例

REGEX_BEFORE = \b
YOUR_NORMAL_PATTERN = \w+
REGEX_AFTER =
EXCEPTION_PATTERN = (苹果|橙子|果汁)

Python regex

pattern = r"\b(?>(?P<exceptions_group_1>(apple|orange|juice))|\w+)(?(exceptions_group_1)always(?<=fail)|)"

Ruby regex

pattern = /\b(?>(?<exceptions_group_1>(apple|orange|juice))|\w+)(?(<exceptions_group_1>)always(?<=fail)|)/

PCRE regex

\b(?>(?<exceptions_group_1>(apple|orange|juice))|\w+)(?(exceptions_group_1)always(?<=fail)|)

它是如何工作的?

它使用相当复杂的正则表达式,即原子组、条件、后向查找和命名组。

  1. (?> 是原子组的开始,这意味着它不允许回溯:这意味着,如果该组匹配一次,但随后由于回溯失败而失效,那么整个组将无法匹配。(在这种情况下我们希望这种行为)。

  2. 。 exceptions_group_1> 创建一个命名捕获组。请注意,该模式首先尝试查找异常,如果找不到异常,则返回到正常模式。

  3. 请注意,原子模式首先尝试查找异常,如果找不到异常,则返回到正常模式。

    请注意,

  4. 真正的魔力在于(?(exceptions_group_1)。这是一个条件询问exceptions_group_1是否成功匹配。如果是,那么它会尝试查找always(? <=fail)。该模式(正如它所说的)总是会失败,因为它寻找单词“always”,然后检查“does“ways”==“fail””,而它永远不会失败。 .

  5. 因为条件失败,这意味着原子组失败,并且因为它是原子的,这意味着它不允许回溯(尝试寻找正常模式),因为它已经匹配了异常。

这绝对不是这些工具的预期用途,但它应该可靠且高效地工作。

Ruby 中原始问题的准确答案

/\b(?>(?<exceptions_group_1>(apple|orange|juice))|\w+)(?(<exceptions_group_1>)always(?<=fail)|)/

与其他方法不同,可以修改此方法以拒绝任何模式,例如不包含子字符串“apple”、“orange”或“juice”的任何单词。

/\b(?>(?<exceptions_group_1>\w*(apple|orange|juice))|\w+)(?(<exceptions_group_1>)always(?<=fail)|)/

Here's some easy copy-paste code that works for more than just exact-words exceptions.

Copy/Paste Code:

In the following regex, ONLY replace the all-caps sections with your regex.

Python regex

pattern = r"REGEX_BEFORE(?>(?P<exceptions_group_1>EXCEPTION_PATTERN)|YOUR_NORMAL_PATTERN)(?(exceptions_group_1)always(?<=fail)|)REGEX_AFTER"

Ruby regex

pattern = /REGEX_BEFORE(?>(?<exceptions_group_1>EXCEPTION_PATTERN)|YOUR_NORMAL_PATTERN)(?(<exceptions_group_1>)always(?<=fail)|)REGEX_AFTER/

PCRE regex

REGEX_BEFORE(?>(?<exceptions_group_1>EXCEPTION_PATTERN)|YOUR_NORMAL_PATTERN)(?(exceptions_group_1)always(?<=fail)|)REGEX_AFTER

JavaScript

Impossible as of 6/17/2020, and probably won't be possible in the near future.

Full Examples

REGEX_BEFORE = \b
YOUR_NORMAL_PATTERN = \w+
REGEX_AFTER =
EXCEPTION_PATTERN = (apple|orange|juice)

Python regex

pattern = r"\b(?>(?P<exceptions_group_1>(apple|orange|juice))|\w+)(?(exceptions_group_1)always(?<=fail)|)"

Ruby regex

pattern = /\b(?>(?<exceptions_group_1>(apple|orange|juice))|\w+)(?(<exceptions_group_1>)always(?<=fail)|)/

PCRE regex

\b(?>(?<exceptions_group_1>(apple|orange|juice))|\w+)(?(exceptions_group_1)always(?<=fail)|)

How does it work?

This uses decently complicated regex, namely Atomic Groups, Conditionals, Lookbehinds, and Named Groups.

  1. The (?> is the start of an atomic group, which means its not allowed to backtrack: which means, If that group matches once, but then later gets invalidated because a lookbehind failed, then the whole group will fail to match. (We want this behavior in this case).

  2. The (?<exceptions_group_1> creates a named capture group. Its just easier than using numbers. Note that the pattern first tries to find the exception, and then falls back on the normal pattern if it couldn't find the exception.

  3. Note that the atomic pattern first tries to find the exception, and then falls back on the normal pattern if it couldn't find the exception.

  4. The real magic is in the (?(exceptions_group_1). This is a conditional asking whether or not exceptions_group_1 was successfully matched. If it was, then it tries to find always(?<=fail). That pattern (as it says) will always fail, because its looking for the word "always" and then it checks 'does "ways"=="fail"', which it never will.

  5. Because the conditional fails, this means the atomic group fails, and because it's atomic that means its not allowed to backtrack (to try to look for the normal pattern) because it already matched the exception.

This is definitely not how these tools were intended to be used, but it should work reliably and efficiently.

Exact answer to the original question in Ruby

/\b(?>(?<exceptions_group_1>(apple|orange|juice))|\w+)(?(<exceptions_group_1>)always(?<=fail)|)/

Unlike other methods, this one can be modified to reject any pattern such as any word not containing the sub-string "apple","orange", or "juice".

/\b(?>(?<exceptions_group_1>\w*(apple|orange|juice))|\w+)(?(<exceptions_group_1>)always(?<=fail)|)/
蓝色星空 2024-08-20 01:30:32

类似(PHP)的东西

$input = "The orange apple gave juice";
if(preg_match("your regex for validating") && !preg_match("/apple|orange|juice/", $input))
{
  // it's ok;
}
else
{
  //throw validation error
}

Something like (PHP)

$input = "The orange apple gave juice";
if(preg_match("your regex for validating") && !preg_match("/apple|orange|juice/", $input))
{
  // it's ok;
}
else
{
  //throw validation error
}
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文