匹配所有“http”仅限没有附加字符的 URL
我尝试过下面的表达方式。
(http:\/\/.*?)['\"\< \>]
(http:\/\/[-a-zA-Z0-9+&@#\/%?=~_|!:,.;\"]*[-a-zA-Z0-9+&@#\/%=~_|\"])
第一个做得很好,但总是给出最后一个额外的字符和匹配的 URL。
例如:
http://domain.com/path.html"
http://domain.com/path.html<
请注意
" <
,我不希望它们带有 URL。
I have tried the below expressions.
(http:\/\/.*?)['\"\< \>]
(http:\/\/[-a-zA-Z0-9+&@#\/%?=~_|!:,.;\"]*[-a-zA-Z0-9+&@#\/%=~_|\"])
The first one is doing well but always gives the last extra character with the matched URLs.
Eg:
http://domain.com/path.html"
http://domain.com/path.html<
Notice
" <
I don't want them with URLs.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
您可以使用前瞻而不是使
['\"\< >]
成为匹配的一部分,即:一般来说,而
ab
匹配ab
(如果后面跟着b
)。a
匹配a(?=b)
与 www.regular-expressions.info/lookaround.html" rel="nofollow noreferrer">regular-expressions.info/Lookarounds相关问题
(?<=#)[^#]+(?=#)
如何工作?捕获组选项
Lookarounds 并非所有风格都支持。
一般来说,虽然
(a)b
仍然匹配ab
,但它也捕获a<。 /code> 在第 1 组中。
参考文献
相关问题
否定字符类选项
根据需要,通常使用否定字符类比使用不情愿的
.*?
(后跟前瞻来断言终止符模式)要好得多在这种情况下)。让我们考虑匹配“
A
和ZZ
之间的所有内容”的问题。事实证明,这个规范是不明确的:我们将提出 3 种模式来执行此操作,并且它们将产生不同的匹配。哪一个是“正确的”取决于预期,而原始陈述中没有正确传达这一点。我们使用以下内容作为输入:
我们使用 3 种不同的模式:
A(.*)ZZ
产生 1 个匹配:AiiZooAuuZZeeeZZ
(如 ideone.com 上所见)iiZooAuuZZeee
A(.*?)ZZ
产生 1 个匹配:AiiZooAuuZZ
(如 ideone.com 上所示)iiZooAuu
A([^Z]*)ZZ
产生 1 个匹配:AuuZZ
(如 ideone.com 上所示)uu
以下是它们匹配内容的直观表示:
参考文献
相关问题
.*?
和 < code>.* 用于正则表达式You can use lookahead instead of making
['\"\< >]
part of your match, i.e.:Generally speaking, whereas
ab
matchesab
,a(?=b)
matchesa
(if it's followed byb
).References
Related questions
(?<=#)[^#]+(?=#)
work?Capturing group option
Lookarounds are not supported by all flavors. More widely supported are capturing groups.
Generally speaking, whereas
(a)b
still matchesab
, it also capturesa
in group 1.References
Related questions
Negated character class option
Depending on the need, often times using a negated character class is much better than using a reluctant
.*?
(followed by a lookahead to assert the terminator pattern in this case).Let's consider the problem of matching "everything between
A
andZZ
". As it turns out, this specification is ambiguous: we will come up with 3 patterns that does this, and they will yield different matches. Which one is "correct" depends on the expectation, which is not properly conveyed in the original statement.We use the following as input:
We use 3 different patterns:
A(.*)ZZ
yields 1 match:AiiZooAuuZZeeeZZ
(as seen on ideone.com)iiZooAuuZZeee
A(.*?)ZZ
yields 1 match:AiiZooAuuZZ
(as seen on ideone.com)iiZooAuu
A([^Z]*)ZZ
yields 1 match:AuuZZ
(as seen on ideone.com)uu
Here's a visual representation of what they matched:
References
Related questions
.*?
and.*
for regex您需要使用“(?=regex)”(lookahead),它查找特定模式,但不将其包含在结果中:
You need to use "(?=regex)" (lookahead), which lookups a particular pattern, but doesn't include it in the result:
嗯,我可能会简单地通过说“继续前进,直到得到不需要的字符”来做到这一点,如下所示:
转义版本(基于 Q - 不确定这是什么引擎):
但是 polygenelubricants 的前瞻解决方案是一种更灵活的方式,如果 URL 中可能包含其中一些字符(但不在末尾)。
Hmmm, I'd probably do this simply by saying "keep going until you get an unwanted character", like so:
Escaped version (based on Q - not sure what engine this is):
However the lookahead solution by polygenelubricants is a more flexible way, if you might have some of those characters in the URL (but not at the end).