好的,我有一个短语“foo bar”,我想找到除“foo bar”之外的所有内容。
这是我的文字。
ipsum dolor foo bar Lorem ipsum dolor sat amet,
consectetur adipisicing elit, sed do
eiusmod tempor foo bar inciditunt ut Labore et
多洛·福酒吧
有一种方法可以在正则表达式中做到这一点,对吧?我不必去使用字符串等,不是吗?
结果:
注意我无法做一个很好的突出显示,但粗体给了你一个想法(虽然之前和之后的空格也会被选择,但它打破了粗体)。
ipsum dolor foo bar Lorem ipsum dolor sat amet,
consectetur adipisicing elit,sed do
eiusmod tempor foo bar incididunt ut Labore et
dolore foo bar
假设 PCRE 命名法。
更新 7/29/2013:最好使用您的语言的搜索和替换功能选择“删除”您不想要的短语,以便留下您想要的信息。
Ok, so I have a phrase "foo bar" and I want to find everything BUT "foo bar".
Here's my text.
ipsum dolor foo bar Lorem ipsum dolor sit amet,
consectetur adipisicing elit, sed do
eiusmod tempor foo bar incididunt ut labore et
dolore foo bar
There's a way to do this just within regex right? I don't have to go and use strings etc. do I?
RESULT:
NOTE I can't do a nice highlighting but the bold gives you an idea (although the spaces that are before and after would also be selected but it breaks the bolding).
ipsum dolor foo bar Lorem ipsum dolor sit amet,
consectetur adipisicing elit, sed do
eiusmod tempor foo bar incididunt ut labore et
dolore foo bar
Assume PCRE nomenclature.
UPDATE 7/29/2013: it may be better to use a search and replace function in your language of choice to just 'remove' the phrases that you don't want so that you are then left with the info you do want.
发布评论
评论(6)
一般来说,如果
foobar
匹配自身,则(?s:(?!foobar).)*
匹配任何不foobar
,什么都不包含。您可以使用它来查找其中没有
foobar
的行,例如,使用^(?:(?!foobar).)*$
您也可以使用您语言的
split()
函数在foobar
上拆分,这将为您提供所有不的部分包括分割模式。关于讨厌的鲜为人知的回溯控制动词,例如
(*FAIL)
和(*COMMIT)
,我还没有太多机会在“非玩具”中使用它们' 节目。我发现通过(?>...)
和所有格量词*+
、++
、?+ 建立独立子表达式可以这么说,等等给我足够的绳索。
也就是说,我确实有一个在 这个答案;这是第一个正则表达式解决方案。它存在的原因是我想强制正则表达式引擎回溯所有可能的排列;真正的目标只是计算它尝试了多少种方法。
请理解,我的两个正则表达式,以及其他人提供的许多非常有创意的答案,都是为了有趣、半开玩笑的。尽管如此,一旦人们从震惊中恢复过来,就可以从他们身上学到很多东西。 ☺
In general, if
foobar
matches itself, then(?s:(?!foobar).)*
matches anything that is notfoobar
, including nothing at all.You could use that to find lines that don’t have
foobar
in them, for example, using^(?:(?!foobar).)*$
You could also use your language’s
split()
function to split onfoobar
, which will give you all the pieces that do not include the split pattern.Regarding the nasty little-known backtracking control verbs like
(*FAIL)
and(*COMMIT)
, I haven’t yet had much occasion to use them in ‘non-toy’ programs. I find that independent subexpressions via(?>...)
and the possessive quantifiers*+
,++
,?+
etc. give me more than enough rope, so to speak.That said, I do have one toy example of using
(*FAIL)
in this answer; it’s the very first regex solution. The reason for its being there was I wanted to force the regex engine to backtrack through all possible permutations; the real goal was merely to count how many ways it tried things.Please understand that my two regexes there, along with the many, many incredibly creative answers from others, are all meant to be fun, tongue-in-cheek things. Still, one can learn a lot from them — once one recovers from shock. ☺
尝试
这个应该选择不包含“foo bar”的每一行。 (?! = 负向前瞻)
try
this should select every line that does not contain "foo bar". (?! = negative lookahead)
“删除除 foo bar 之外的所有内容”相当于“仅查找 foo bar”,PCRE 非常容易地允许这样做。相反,“查找除 foo bar 之外的所有内容”相当于“仅查找并删除 foo bar”。因此,可以通过您的工具轻松完成补充。
除此之外,PCRE 还有一个令人讨厌的小功能,称为
*FAIL
,遇到它时会立即导致回溯。因此,我想在正则表达式中插入类似(*COMMIT)foo bar(*FAIL)
的内容可能会有所帮助。但它既不友好也不太安全。"remove everything except foo bar" is equivalent to "find only foo bar", which PCRE allows quite easily. Conversely, "find everything except foo bar" is equivalent to "find and remove only foo bar". So, complementation is easily done from your tools.
Aside from that, PCRE has a nasty little feature known as
*FAIL
which immediately causes a backtrack when it's encountered. So, I suppose inserting something like(*COMMIT)foo bar(*FAIL)
into your regular expression could help. It's neither friendly nor very safe, though.好的,您想使用 UltraEdit 的“高级”(Perl 正则表达式样式)搜索功能来删除除
foo bar
之外的所有内容。最简单的方法是匹配所有内容,但仅捕获foo bar
,如下所示:...并将其替换为
$1
或\1
(UltraEdit 接受的样式)。我不使用 UltraEdit,但在 EditPadPro 中它将以下内容转换
为
: ...这是您在原始消息中显示的结果。
Okay, so you want to remove everything except
foo bar
using UltraEdit's "Advanced" (Perl-regex style) search feature. The easiest way to do that is to match everything, but only capturefoo bar
, like this:...and replace it with
$1
or\1
(whichever style UltraEdit accepts).I don't use UltraEdit, but in EditPadPro it converts this:
...to this:
...which is the result you showed in your original message.
这里:
perl -pe 's{.*?(foo bar)?}{$1}g'
不使用 $1 替换的仅匹配模式(可以与 s{pattern}{} 中的空替换一起使用)...不确定这是否可能。您必须吞噬直到
foo bar
为止的字符,例如使用.*?(?=foo bar)
。但随后匹配算法继续并看到“oo bar”,并且会再次匹配,因为没有 f。继续探索,这里是一段 Perl 代码,它吞噬了请求的部分,唯一的缺点是,如果
foo bar
恰好位于行的开头,则可能会返回空捕获:不涉及替换,并且在 Lorem ipsum 测试文件上运行此文件将显示除 foo bar 部分之外的所有内容。它与 PCRE 兼容,但不能保证 $EDITOR 能够实现您的预期。
Here:
perl -pe 's{.*?(foo bar)?}{$1}g' <text
A match-only pattern without using substitution by $1 (that is usable with the empty replacement as in s{pattern}{})... not sure that is possible. You would have to gobble up chars up until
foo bar
, e.g. with.*?(?=foo bar)
. But then the matching algorithm continues on and sees "oo bar", and would match again as there is no f.Continuing the quest, here is a piece of perl code that gobbles up the requested parts, only with the drawback that empty captures may be returned if
foo bar
happens to be at the start of the line:There is no substituion involved and running this on the Lorem ipsum test file will show everything but the
foo bar
parts. It is PCRE compatible, but there is no guarantees that $EDITOR will does what you envision.显示除“foo bar”和“fad bad”之外的所有内容,这对我有用:
^(?!.*foo bar)(?!.*fad bad).*$
to show everything except "foo bar" and "fad bad" this worked for me:
^(?!.*foo bar)(?!.*fad bad).*$