如何编写排除而不是匹配的正则表达式,例如 not (this|string)?

发布于 2024-08-20 18:05:17 字数 212 浏览 5 评论 0原文

我在尝试创建一个排除组的 Emacs 正则表达式时遇到了困难。 [^] 排除集合中的单个字符,但我想排除特定的字符序列:类似于 [^(not|this)],这样包含“not”或“this”的字符串就不匹配。

原则上,我可以写 ([^n][^o][^t]|[^...]),但是还有其他更干净的方法吗?

I am stumped trying to create an Emacs regular-expression that excludes groups. [^] excludes individual characters in a set, but I want to exclude specific sequences of characters: something like [^(not|this)], so that strings containing "not" or "this" are not matched.

In principle, I could write ([^n][^o][^t]|[^...]), but is there another way that's cleaner?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(8

枕梦 2024-08-27 18:05:17

这并不容易实现。正则表达式旨在匹配事物,这就是它们所能做的。

首先:[^]不指定“排除组”,它指定否定字符类。字符类不支持任何形式或形状的分组。它们支持单个字符(为了方便起见,还支持字符范围)。就正则表达式引擎而言,您的尝试 [^(not|this)] 100% 等同于 [^)(|hinots]

三种方式可以导致在这种情况下:

  1. 匹配(not|this)排除在您所处环境的帮助下的任何匹配(否定匹配结果)
  2. 使用否定前瞻,如果受正则表达式引擎支持并且在这种情况下可行
  3. 重写表达式,以便它可以匹配:请参阅我之前问过一个类似的问题

This is not easily possible. Regular expressions are designed to match things, and this is all they can do.

First off: [^] does not designate an "excludes group", it designates a negated character class. Character classes do not support grouping in any form or shape. They support single characters (and, for convenience, character ranges). Your try [^(not|this)] is 100% equivalent to [^)(|hinots], as far as the regex engine is concerned.

Three ways can lead out of this situation:

  1. match (not|this) and exclude any matches with the help of the environment you are in (negate match results)
  2. use negative look-ahead, if supported by your regex engine and feasible in the situation
  3. rewrite the expression so it can match: see a similar question I asked earlier
栩栩如生 2024-08-27 18:05:17

首先: [^n][^o][^t] 不是一个解决方案。这还会排除诸如 nil[^n] 不匹配)、bob ([^o] 不匹配)或 cat[^t] 不匹配)。

但是可以使用基本语法构建一个正则表达式,该正则表达式确实匹配既不包含 not 也不包含 this 的字符串:

^([^nt]|n($|[^o]|o($|[^t]))|t($|[^h]|h($|[^i]|i($|[^s]))))*$

该正则表达式的模式是允许任何包含不是单词的第一个字符或只是单词的前缀而不是整个单词。

First of all: [^n][^o][^t] is not a solution. This would also exclude words like nil ([^n] does not match), bob ([^o] does not match) or cat ([^t] does not match).

But it is possible to build a regular expression with basic syntax that does match strings that neither contain not nor this:

^([^nt]|n($|[^o]|o($|[^t]))|t($|[^h]|h($|[^i]|i($|[^s]))))*$

The pattern of this regular expression is to allow any character that is not the first character of the words or only prefixes of the words but not the whole words.

陈甜 2024-08-27 18:05:17

很难相信接受的答案(来自 Gumbo)实际上被接受了!除非它被接受,因为它表明你不能做你想做的事。除非你有一个函数可以生成这样的正则表达式(如 Gumbo 所示),否则编写它们将是一件非常痛苦的事情。

真正的用例是什么——你真正想要做什么?

正如托马拉克所指出的,(a)这不是正则表达式所做的; (b) 请参阅他链接到的其他帖子,以获得更好的解释,包括如何解决您的问题。

答案是使用正则表达式来匹配您不想要的内容,然后从初始域中减去它。 IOW,不要尝试让正则表达式进行排除(它不能);使用正则表达式匹配您要排除的内容之后进行排除。

这就是每个使用正则表达式的工具(例如,grep)的工作原理:它们提供一个单独的选项(例如,通过语法)来执行减法——在匹配需要减法的内容之后。

Hard to believe that the accepted answer (from Gumbo) was actually accepted! Unless it was accepted because it indicated that you cannot do what you want. Unless you have a function that generates such regexps (as Gumbo shows), composing them would be a real pain.

What is the real use case -- what are you really trying to do?

As Tomalak indicated, (a) this is not what regexps do; (b) see the other post he linked to, for a good explanation, including what to do about your problem.

The answer is to use a regexp to match what you do not want, and then subtract that from the initial domain. IOW, do not try to make the regexp do the excluding (it cannot); do the excluding after using a regexp to match what you want to exclude.

This is how every tool that uses regexps works (e.g., grep): they offer a separate option (e.g. via syntax) that carries out the subtraction -- after matching what needs to be subtracted.

差↓一点笑了 2024-08-27 18:05:17

听起来你正试图进行负面的前瞻。即,一旦到达某个分隔符,您就试图停止匹配。

Emacs 不直接支持向前查找,但它支持 *、+ 和 ? 的非贪婪版本。运算符(*?、+?、??),在大多数情况下可用于相同目的。

例如,要匹配此 javascript 函数的主体:

bar = function (args) {
    if (blah) {
        foo();
    }
};

您可以使用此 emacs 正则表达式:

function ([^)]+) {[[:ascii:]]+?};

一旦找到两个元素序列“};”,我们就停止。 [[:ascii:]] 用于代替“.”。运算符,因为它在多行上工作。

这与负向前看有点不同,因为 };它匹配序列本身,但是如果您的目标是提取该点之前的所有内容,则只需使用捕获组 \( 和 \)。

请参阅 emacs 正则表达式手册: http://www.gnu.org/software/emacs/ Manual/html_node/emacs/Regexps.html

作为旁注,如果您编写任何类型的 emacs 正则表达式,请务必调用 Mx 重新构建器,这将打开一个小 IDE,用于根据当前的正则表达式编写您的正则表达式缓冲。

It sounds like you are trying to do negative lookahead. i.e. you are trying to stop matching once you reach some delimiter.

Emacs doesn't support lookahead directly, but it does support the non-greedy version of the *, +, and ? operators (*?, +?, ??), which can be used for the same purpose in most cases.

So for instance, to match the body of this javascript function:

bar = function (args) {
    if (blah) {
        foo();
    }
};

You can use this emacs regex:

function ([^)]+) {[[:ascii:]]+?};

Here we're stopping once we find the two element sequence "};". [[:ascii:]] is used instad of the "." operator because it works over multiple lines.

This is a little different than negative lookahead because the }; sequence itself it matched, however if your goal is to extract everything up until that point, you just use a capturing group \( and \).

See the emacs regex manual: http://www.gnu.org/software/emacs/manual/html_node/emacs/Regexps.html

As a side note, if you writing any kind of emacs regex, be sure to invoke M-x re-builder, which will bring up a little IDE for writing your regex against the current buffer.

煮酒 2024-08-27 18:05:17

尝试 Mx 冲洗管线。

Try M-x flush-lines.

最美的太阳 2024-08-27 18:05:17

对于匹配字符串进行逻辑测试的用例,我这样做:

;; Code to match string ends with '-region' but excludes those that has 'mouse'.
M-x ielm RET
*** Welcome to IELM ***  Type (describe-mode) for help.
ELISP> (setq str1 "mouse-drag-region" str2 "mou-drag-region" str3 "mou-region-drag")
"mou-region-drag"
ELISP> (and (string-match-p "-region$" str1) (not (string-match-p "mouse" str1)))
nil
ELISP> (and (string-match-p "-region$" str2) (not (string-match-p "mouse" str2))) 
t
ELISP> (and (string-match-p "-region$" str3) (not (string-match-p "mouse" str3)))
nil

我使用这种方法来避免我讨论的函数的错误 在这里

For use case of matching a string for logical test, I do this:

;; Code to match string ends with '-region' but excludes those that has 'mouse'.
M-x ielm RET
*** Welcome to IELM ***  Type (describe-mode) for help.
ELISP> (setq str1 "mouse-drag-region" str2 "mou-drag-region" str3 "mou-region-drag")
"mou-region-drag"
ELISP> (and (string-match-p "-region$" str1) (not (string-match-p "mouse" str1)))
nil
ELISP> (and (string-match-p "-region$" str2) (not (string-match-p "mouse" str2))) 
t
ELISP> (and (string-match-p "-region$" str3) (not (string-match-p "mouse" str3)))
nil

I use this approach to avoid the bug of the function I discussed Over Here:

绝情姑娘 2024-08-27 18:05:17

我的问题是如何将否定的正则表达式传递给 delete-lines 解决方案是传递正则表达式 Mx keep-lines

My problem was how to pass a negated regexp to delete-lines the solution was to pass the regexp M-x keep-lines

陈甜 2024-08-27 18:05:17

如果您尝试使用正则表达式查找或替换缓冲区中的文本,可以使用 https ://github.com/benma/visual-regexp-steroids.el/

Visual regexp steroids 允许您使用 python 正则表达式进行替换、搜索等。 Python 正则表达式支持负向前看和负向后看。

If you are trying to use regex to find or replace text in a buffer you can use https://github.com/benma/visual-regexp-steroids.el/

Visual regexp steroids allows you to replace, search, etc. using python regex. Python regex has support for negative look ahead and negative look behind.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文