当前位置：文江博客话题详情

如何编写排除而不是匹配的正则表达式，例如 not (this|string)？

发布于 2024-08-20 18:05:17 字数 212 浏览 14 评论 0原文

我在尝试创建一个排除组的 Emacs 正则表达式时遇到了困难。 [^] 排除集合中的单个字符，但我想排除特定的字符序列：类似于 [^(not|this)]，这样包含“not”或“this”的字符串就不匹配。

原则上，我可以写 ([^n][^o][^t]|[^...])，但是还有其他更干净的方法吗？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

枕梦 2024-08-27 18:05:17

这并不容易实现。正则表达式旨在匹配事物，这就是它们所能做的。

首先：[^]不指定“排除组”，它指定否定字符类。字符类不支持任何形式或形状的分组。它们支持单个字符（为了方便起见，还支持字符范围）。就正则表达式引擎而言，您的尝试 [^(not|this)] 100% 等同于 [^)(|hinots]。

三种方式可以导致在这种情况下：

匹配(not|this)并排除在您所处环境的帮助下的任何匹配（否定匹配结果）
使用否定前瞻，如果受正则表达式引擎支持并且在这种情况下可行
重写表达式，以便它可以匹配：请参阅我之前问过一个类似的问题

回复收藏 0 原文

栩栩如生 2024-08-27 18:05:17

首先： [^n][^o][^t] 不是一个解决方案。这还会排除诸如 nil （[^n] 不匹配）、bob ([^o] 不匹配）或 cat （[^t] 不匹配）。

但是可以使用基本语法构建一个正则表达式，该正则表达式确实匹配既不包含 not 也不包含 this 的字符串：

^([^nt]|n($|[^o]|o($|[^t]))|t($|[^h]|h($|[^i]|i($|[^s]))))*$

该正则表达式的模式是允许任何包含不是单词的第一个字符或只是单词的前缀而不是整个单词。

First of all: [^n][^o][^t] is not a solution. This would also exclude words like nil ([^n] does not match), bob ([^o] does not match) or cat ([^t] does not match).

But it is possible to build a regular expression with basic syntax that does match strings that neither contain not nor this:

^([^nt]|n($|[^o]|o($|[^t]))|t($|[^h]|h($|[^i]|i($|[^s]))))*$

The pattern of this regular expression is to allow any character that is not the first character of the words or only prefixes of the words but not the whole words.

回复收藏 0 原文

陈甜 2024-08-27 18:05:17

很难相信接受的答案（来自 Gumbo）实际上被接受了！除非它被接受，因为它表明你不能做你想做的事。除非你有一个函数可以生成这样的正则表达式（如 Gumbo 所示），否则编写它们将是一件非常痛苦的事情。

真正的用例是什么——你真正想要做什么？

正如托马拉克所指出的，（a）这不是正则表达式所做的； (b) 请参阅他链接到的其他帖子，以获得更好的解释，包括如何解决您的问题。

答案是使用正则表达式来匹配您不想要的内容，然后从初始域中减去它。 IOW，不要尝试让正则表达式进行排除（它不能）；使用正则表达式匹配您要排除的内容之后进行排除。

这就是每个使用正则表达式的工具（例如，grep）的工作原理：它们提供一个单独的选项（例如，通过语法）来执行减法——在匹配需要减法的内容之后。

回复收藏 0 原文

差↓一点笑了 2024-08-27 18:05:17

听起来你正试图进行负面的前瞻。即，一旦到达某个分隔符，您就试图停止匹配。

Emacs 不直接支持向前查找，但它支持 *、+ 和 ? 的非贪婪版本。运算符（*？、+？、??），在大多数情况下可用于相同目的。

例如，要匹配此 javascript 函数的主体：

bar = function (args) {
    if (blah) {
        foo();
    }
};

您可以使用此 emacs 正则表达式：

function ([^)]+) {[[:ascii:]]+?};

一旦找到两个元素序列“};”，我们就停止。 [[:ascii:]] 用于代替“.”。运算符，因为它在多行上工作。

这与负向前看有点不同，因为 };它匹配序列本身，但是如果您的目标是提取该点之前的所有内容，则只需使用捕获组 \( 和 \)。

请参阅 emacs 正则表达式手册： http://www.gnu.org/software/emacs/ Manual/html_node/emacs/Regexps.html

作为旁注，如果您编写任何类型的 emacs 正则表达式，请务必调用 Mx 重新构建器，这将打开一个小 IDE，用于根据当前的正则表达式编写您的正则表达式缓冲。

It sounds like you are trying to do negative lookahead. i.e. you are trying to stop matching once you reach some delimiter.

Emacs doesn't support lookahead directly, but it does support the non-greedy version of the *, +, and ? operators (*?, +?, ??), which can be used for the same purpose in most cases.

So for instance, to match the body of this javascript function:

bar = function (args) {
    if (blah) {
        foo();
    }
};

You can use this emacs regex:

function ([^)]+) {[[:ascii:]]+?};

Here we're stopping once we find the two element sequence "};". [[:ascii:]] is used instad of the "." operator because it works over multiple lines.

This is a little different than negative lookahead because the }; sequence itself it matched, however if your goal is to extract everything up until that point, you just use a capturing group \( and \).

See the emacs regex manual: http://www.gnu.org/software/emacs/manual/html_node/emacs/Regexps.html

As a side note, if you writing any kind of emacs regex, be sure to invoke M-x re-builder, which will bring up a little IDE for writing your regex against the current buffer.

回复收藏 0 原文

煮酒 2024-08-27 18:05:17

尝试 Mx 冲洗管线。

回复收藏 0 原文

最美的太阳 2024-08-27 18:05:17

对于匹配字符串进行逻辑测试的用例，我这样做：

;; Code to match string ends with '-region' but excludes those that has 'mouse'.
M-x ielm RET
*** Welcome to IELM ***  Type (describe-mode) for help.
ELISP> (setq str1 "mouse-drag-region" str2 "mou-drag-region" str3 "mou-region-drag")
"mou-region-drag"
ELISP> (and (string-match-p "-region$" str1) (not (string-match-p "mouse" str1)))
nil
ELISP> (and (string-match-p "-region$" str2) (not (string-match-p "mouse" str2))) 
t
ELISP> (and (string-match-p "-region$" str3) (not (string-match-p "mouse" str3)))
nil

我使用这种方法来避免我讨论的函数的错误在这里：

For use case of matching a string for logical test, I do this:

;; Code to match string ends with '-region' but excludes those that has 'mouse'.
M-x ielm RET
*** Welcome to IELM ***  Type (describe-mode) for help.
ELISP> (setq str1 "mouse-drag-region" str2 "mou-drag-region" str3 "mou-region-drag")
"mou-region-drag"
ELISP> (and (string-match-p "-region$" str1) (not (string-match-p "mouse" str1)))
nil
ELISP> (and (string-match-p "-region$" str2) (not (string-match-p "mouse" str2))) 
t
ELISP> (and (string-match-p "-region$" str3) (not (string-match-p "mouse" str3)))
nil

I use this approach to avoid the bug of the function I discussed Over Here:

回复收藏 0 原文