如何使用正则表达式找到除某些短语之外的所有内容?

发布于 2024-10-01 08:35:57 字数 771 浏览 3 评论 0 原文

好的,我有一个短语“foo bar”,我想找到除“foo bar”之外的所有内容。
这是我的文字。

ipsum dolor foo bar Lorem ipsum dolor sat amet,
consectetur adipisicing elit, sed do
eiusmod tempor foo bar inciditunt ut Labore et
多洛·福酒吧

有一种方法可以在正则表达式中做到这一点,对吧?我不必去使用字符串等,不是吗?

结果:

注意我无法做一个很好的突出显示,但粗体给了你一个想法(虽然之前和之后的空格也会被选择,但它打破了粗体)。

ipsum dolor foo bar Lorem ipsum dolor sat amet,
consectetur adipisicing elit,sed do
eiusmod tempor foo bar incididunt ut Labore et
dolore foo bar

假设 PCRE 命名法。


更新 7/29/2013:最好使用您的语言的搜索和替换功能选择“删除”您不想要的短语,以便留下您想要的信息。

Ok, so I have a phrase "foo bar" and I want to find everything BUT "foo bar".
Here's my text.

ipsum dolor foo bar Lorem ipsum dolor sit amet,
consectetur adipisicing elit, sed do
eiusmod tempor foo bar incididunt ut labore et
dolore foo bar

There's a way to do this just within regex right? I don't have to go and use strings etc. do I?

RESULT:

NOTE I can't do a nice highlighting but the bold gives you an idea (although the spaces that are before and after would also be selected but it breaks the bolding).

ipsum dolor foo bar Lorem ipsum dolor sit amet,
consectetur adipisicing elit, sed do
eiusmod tempor foo bar incididunt ut labore et
dolore foo bar

Assume PCRE nomenclature.


UPDATE 7/29/2013: it may be better to use a search and replace function in your language of choice to just 'remove' the phrases that you don't want so that you are then left with the info you do want.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

江心雾 2024-10-08 08:35:57

一般来说,如果 foobar 匹配自身,则 (?s:(?!foobar).)* 匹配任何 foob​​ar,什么都不包含。

您可以使用它来查找其中没有 foobar 的行,例如,使用

^(?:(?!foobar).)*$

您也可以使用您语言的 split() 函数在 foobar 上拆分,这将为您提供所有的部分包括分割模式。

关于讨厌的鲜为人知的回溯控制动词,例如 (*FAIL)(*COMMIT),我还没有太多机会在​​“非玩具”中使用它们' 节目。我发现通过 (?>...) 和所有格量词 *+++?+ 建立独立子表达式可以这么说,等等给我足够的绳索。

也就是说,我确实有一个在 这个答案;这是第一个正则表达式解决方案。它存在的原因是我想强制正则表达式引擎回溯所有可能的排列;真正的目标只是计算它尝试了多少种方法。

请理解,我的两个正则表达式,以及其他人提供的许多非常有创意的答案,都是为了有趣、半开玩笑的。尽管如此,一旦人们从震惊中恢复过来,就可以从他们身上学到很多东西。 ☺

In general, if foobar matches itself, then (?s:(?!foobar).)* matches anything that is not foobar, including nothing at all.

You could use that to find lines that don’t have foobar in them, for example, using

^(?:(?!foobar).)*$

You could also use your language’s split() function to split on foobar, which will give you all the pieces that do not include the split pattern.

Regarding the nasty little-known backtracking control verbs like (*FAIL) and (*COMMIT), I haven’t yet had much occasion to use them in ‘non-toy’ programs. I find that independent subexpressions via (?>...) and the possessive quantifiers *+, ++, ?+ etc. give me more than enough rope, so to speak.

That said, I do have one toy example of using (*FAIL) in this answer; it’s the very first regex solution. The reason for its being there was I wanted to force the regex engine to backtrack through all possible permutations; the real goal was merely to count how many ways it tried things.

Please understand that my two regexes there, along with the many, many incredibly creative answers from others, are all meant to be fun, tongue-in-cheek things. Still, one can learn a lot from them — once one recovers from shock. ☺

下壹個目標 2024-10-08 08:35:57

尝试

^(?!.*foo bar).*$

这个应该选择不包含“foo bar”的每一行。 (?! = 负向前瞻)

try

^(?!.*foo bar).*$

this should select every line that does not contain "foo bar". (?! = negative lookahead)

花间憩 2024-10-08 08:35:57

“删除除 foo bar 之外的所有内容”相当于“仅查找 foo bar”,PCRE 非常容易地允许这样做。相反,“查找除 foo bar 之外的所有内容”相当于“仅查找并删除 foo bar”。因此,可以通过您的工具轻松完成补充。

除此之外,PCRE 还有一个令人讨厌的小功能,称为 *FAIL,遇到它时会立即导致回溯。因此,我想在正则表达式中插入类似 (*COMMIT)foo bar(*FAIL) 的内容可能会有所帮助。但它既不友好也不太安全。

"remove everything except foo bar" is equivalent to "find only foo bar", which PCRE allows quite easily. Conversely, "find everything except foo bar" is equivalent to "find and remove only foo bar". So, complementation is easily done from your tools.

Aside from that, PCRE has a nasty little feature known as *FAIL which immediately causes a backtrack when it's encountered. So, I suppose inserting something like (*COMMIT)foo bar(*FAIL) into your regular expression could help. It's neither friendly nor very safe, though.

明月松间行 2024-10-08 08:35:57

好的,您想使用 UltraEdit 的“高级”(Perl 正则表达式样式)搜索功能来删除foo bar 之外的所有内容。最简单的方法是匹配所有内容,但仅捕获 foo bar,如下所示:

(?:(?!foo bar).)+(foo bar|$)

...并将其替换为 $1\1(UltraEdit 接受的样式)。

我不使用 UltraEdit,但在 EditPadPro 中它将以下内容转换

ipsum dolor foo bar Lorem ipsum dolor sit amet,
consectetur adipisicing elit, sed do
eiusmod tempor foo bar incididunt ut labore et
dolore foo bar 

foo bar

foo bar
foo bar

: ...这是您在原始消息中显示的结果。

Okay, so you want to remove everything except foo bar using UltraEdit's "Advanced" (Perl-regex style) search feature. The easiest way to do that is to match everything, but only capture foo bar, like this:

(?:(?!foo bar).)+(foo bar|$)

...and replace it with $1 or \1 (whichever style UltraEdit accepts).

I don't use UltraEdit, but in EditPadPro it converts this:

ipsum dolor foo bar Lorem ipsum dolor sit amet,
consectetur adipisicing elit, sed do
eiusmod tempor foo bar incididunt ut labore et
dolore foo bar 

...to this:

foo bar

foo bar
foo bar

...which is the result you showed in your original message.

软糯酥胸 2024-10-08 08:35:57

这里:perl -pe 's{.*?(foo bar)?}{$1}g'

我想找到除“foo bar”之外的所有内容

不使用 $1 替换的仅匹配模式(可以与 s{pattern}{} 中的空替换一起使用)...不确定这是否可能。您必须吞噬直到 foo bar 为止的字符,例如使用 .*?(?=foo bar)。但随后匹配算法继续并看到“oo bar”,并且会再次匹配,因为没有 f。

继续探索,这里是一段 Perl 代码,它吞噬了请求的部分,唯一的缺点是,如果 foo bar 恰好位于行的开头,则可能会返回空捕获

foreach (<>) {
        chomp;
        @_ = m{(.*?)(?:foo bar|$)}gs;
        print "[[ $_ ]]\n" for @_;
}

:不涉及替换,并且在 Lorem ipsum 测试文件上运行此文件将显示除 foo bar 部分之外的所有内容。它与 PCRE 兼容,但不能保证 $EDITOR 能够实现您的预​​期。

Here: perl -pe 's{.*?(foo bar)?}{$1}g' <text

I want to find everything BUT "foo bar"

A match-only pattern without using substitution by $1 (that is usable with the empty replacement as in s{pattern}{})... not sure that is possible. You would have to gobble up chars up until foo bar, e.g. with .*?(?=foo bar). But then the matching algorithm continues on and sees "oo bar", and would match again as there is no f.

Continuing the quest, here is a piece of perl code that gobbles up the requested parts, only with the drawback that empty captures may be returned if foo bar happens to be at the start of the line:

foreach (<>) {
        chomp;
        @_ = m{(.*?)(?:foo bar|$)}gs;
        print "[[ $_ ]]\n" for @_;
}

There is no substituion involved and running this on the Lorem ipsum test file will show everything but the foo bar parts. It is PCRE compatible, but there is no guarantees that $EDITOR will does what you envision.

瞄了个咪的 2024-10-08 08:35:57

显示除“foo bar”和“fad bad”之外的所有内容,这对我有用:

^(?!.*foo bar)(?!.*fad bad).*$

to show everything except "foo bar" and "fad bad" this worked for me:

^(?!.*foo bar)(?!.*fad bad).*$

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文