当前位置：文江博客话题详情

regex language-agnostic regex-greedy

正则表达式不够贪婪

发布于 2024-09-05 22:08:58 字数 1058 浏览 14 评论 0 原文

我有以下正则表达式，在出现新情况之前一直运行良好

^.*[?&]U(?:RL)?=(?<URL>.*)$

基本上，它用于针对 URL，以获取 U= 或 URL= 之后的所有内容并在 URL 匹配中返回它

因此，对于以下

http://localhost?a=b&u=http://otherhost?foo= bar

URL = http://otherhost?foo=bar

不幸的是出现了一个奇怪的情况

http://localhost?a=b& u=http://otherhost?foo=bar&url=http://someotherhost

理想情况下，我希望 URL 为“http://otherhost?foo=bar&url=http://someotherhost"，相反，它只是“http://someotherhost”

编辑：我认为这解决了它......虽然它不太漂亮

^.*[?&](?<![?&]U(?:RL)?=.*)U(?:RL)?=(?<URL>.*)$

原文

I've got the following regex that was working perfectly until a new situation arose

^.*[?&]U(?:RL)?=(?<URL>.*)$

Basically, it's used against URLs, to grab EVERYTHING after the U=, or URL= and return it in the URL match

So, for the following

http://localhost?a=b&u=http://otherhost?foo=bar

URL = http://otherhost?foo=bar

Unfortunately an odd case came up

http://localhost?a=b&u=http://otherhost?foo=bar&url=http://someotherhost

Ideally, I want URL to be "http://otherhost?foo=bar&url=http://someotherhost", instead, it is just "http://someotherhost"

EDIT: I think this fixed it...though it's not pretty

^.*[?&](?<![?&]U(?:RL)?=.*)U(?:RL)?=(?<URL>.*)$

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

梨涡少年 2024-09-12 22:08:58

问题问题

不在于 .* 不够贪婪；而是在于。而是之前出现的其他 .* 也是贪婪。

为了说明这个问题，让我们考虑一个不同的例子。考虑以下两种模式；它们是相同的，除了第二种模式中不愿意使用 \1 ：

              \1 greedy, \2 greedy         \1 reluctant, \2 greedy
              ^([0-5]*)([5-9]*)$           ^([0-5]*?)([5-9]*)$

这里我们有两个捕获组。 \1 捕获 [0-5]*，\2 捕获 [5-9]*。以下是这些模式匹配和捕获的内容的并排比较：

              \1 greedy, \2 greedy          \1 reluctant, \2 greedy
              ^([0-5]*)([5-9]*)$            ^([0-5]*?)([5-9]*)$
Input         Group 1    Group 2            Group 1    Group 2
54321098765   543210     98765              543210     98765
007           00         7                  00         7
0123456789    012345     6789               01234      56789
0506          050        6                  050        6
555           555        <empty>            <empty>    555
5550555       5550555    <empty>            5550       555

请注意，尽管 \2 很贪婪，但它只能捕获 \1 尚未捕获的内容先抢！因此，如果你想让 \2 抓取尽可能多的 5，你就必须让 \1 不情愿，所以 5 实际上已经被 \2 抢占了。

附件

解决方案

因此，将其应用于您的问题，有两种方法可以解决此问题：您可以使第一个 .* 不情愿，因此 (< a href="http://www.rubular.com/r/Ks5wB7LNBx" rel="nofollow noreferrer">参见 rubular.com）：

^.*?[?&]U(?:RL)?=(?<URL>.*)$

或者，您可以完全删除前缀匹配部分（< a href="http://www.rubular.com/r/YTm9YuLQVi" rel="nofollow noreferrer">参见 rubular.com）：

[?&]U(?:RL)?=(?<URL>.*)$

The issue

The problem is not that .* is not being greedy enough; it's that the other .* that appears earlier is also greedy.

To illustrate the issue, let's consider a different example. Consider the following two patterns; they're identical, except in reluctance of \1 in second pattern:

              \1 greedy, \2 greedy         \1 reluctant, \2 greedy
              ^([0-5]*)([5-9]*)$           ^([0-5]*?)([5-9]*)$

Here we have two capturing groups. \1 captures [0-5]*, and \2 captures [5-9]*. Here's a side-by-side comparison of what these patterns match and capture:

              \1 greedy, \2 greedy          \1 reluctant, \2 greedy
              ^([0-5]*)([5-9]*)$            ^([0-5]*?)([5-9]*)$
Input         Group 1    Group 2            Group 1    Group 2
54321098765   543210     98765              543210     98765
007           00         7                  00         7
0123456789    012345     6789               01234      56789
0506          050        6                  050        6
555           555        <empty>            <empty>    555
5550555       5550555    <empty>            5550       555

Note that as greedy as \2 is, it can only grab what \1 didn't already grab first! Thus, if you want to make \2 grab as many 5 as possible, you have to make \1 reluctant, so the 5 is actually up for grab by \2.

Attachments

The fix

So applying this to your problem, there are two ways that you can fix this: you can make the first .* reluctant, so (see on rubular.com):

^.*?[?&]U(?:RL)?=(?<URL>.*)$

Alternatively you can just get rid of the prefix matching part altogether (see on rubular.com):

[?&]U(?:RL)?=(?<URL>.*)$

回复收藏 0 原文

~没有更多了~

关于作者

旧城空念

暂无简介

文章

28 人气

关注发私信

友情链接

文江博客

正则表达式不够贪婪

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

问题问题

附件

相关问题

解决方案

The issue

Attachments

Related questions

The fix

关于作者

相关话题

热门标签

推荐作者

櫻之舞

弥枳

m2429

寻找一个思念的角度

野却迷人

我怀念的。

友情链接

正则表达式不够贪婪

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

问题问题

附件

相关问题

解决方案

The issue

Attachments

Related questions

The fix

关于作者

相关话题

热门标签

推荐作者

櫻之舞

弥枳

m2429

寻找一个思念的角度

野却迷人

我怀念的。

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。