正则表达式贪婪问题
我确信这很简单,但我尝试了很多变体,但仍然无法满足我的需要。 问题是太贪婪了,我无法让它停止贪婪。
鉴于文本:
test=this=that=more text follows
我只想选择:
test=
我已经尝试了以下正则表达式
(\S+)=(\S.*)
(\S+)?=
[^=]{1}
...
谢谢大家。
I'm sure this one is easy but I've tried a ton of variations and still cant match what I need. The thing is being too greedy and I cant get it to stop being greedy.
Given the text:
test=this=that=more text follows
I want to just select:
test=
I've tried the following regex
(\S+)=(\S.*)
(\S+)?=
[^=]{1}
...
Thanks all.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
这里:
您应该考虑使用第二个版本而不是第一个版本。 给定您的字符串
"test=this=that=more text follow"
,版本 1 将匹配test=this=that=
,然后继续解析到字符串末尾。 然后回溯,找到test=this=
,继续回溯,找到test=
,继续回溯,找到test=
> 因为这是最终答案。版本 2 将匹配
test=
然后停止。 您可以看到大型搜索(例如多行或整个文档匹配)的效率提升。here:
you should consider using the second version over the first. given your string
"test=this=that=more text follows"
, version 1 will matchtest=this=that=
then continue parsing to the end of the string. it will then backtrack, and findtest=this=
, continue to backtrack, and findtest=
, continue to backtrack, and settle ontest=
as it's final answer.version 2 will match
test=
then stop. you can see the efficiency gains in larger searches like multi-line or whole document matches.您可能想要类似
^(\S+?=) 的
内容插入符号 ^ 将正则表达式锚定到字符串的开头。 这 ? + 之后使 + 非贪婪。
You probably want something like
^(\S+?=)
The caret ^ anchors the regex to the beginning of the string. The ? after the + makes the + non-greedy.
您可能正在寻找惰性量词 *?, +?, ??,和{n,n}?
You might be looking for lazy quantifiers *?, +?, ??, and {n, n}?
你应该能够使用这个:
You should be able to use this:
惰性量词可以工作,但由于回溯,它们也可能会影响性能。
考虑一下你真正想要的是“一堆不等于,一个等于,还有一堆不等于”。
您的
[^=]{1}
示例仅匹配单个非等于字符。Lazy quantifiers work, but they also can be a performance hit because of backtracking.
Consider that what you really want is "a bunch of non-equals, an equals, and a bunch more non-equals."
Your examples of
[^=]{1}
only matches a single non-equals character.如果你只想要“text=”,我认为一个简单的:
如果你确信字符串“text=”总是会开始该行,那么应该没问题。
真正的问题是当字符串是这样的时候:
如果您使用上面的正则表达式,结果是“this=”,如果您在末尾使用重复器限定符修改上面的内容,如下所示:
您会发现一个巨大的“this=that=”,所以我只能想象微不足道的:
再见。
if you want only "text=", I think that a simply:
should be fine if you are shure about that the string "text=" will always start the line.
the real problem is when the string is like this:
if you use the regex above the result is "this=" and if you modify the above with the reapeater qualifiers at the end, like this:
you find a tremendous "this=that=", so I could only imagine the trivial:
Bye.