匹配所有“http”仅限没有附加字符的 URL

发布于 2024-09-09 10:21:23 字数 364 浏览 2 评论 0原文

我尝试过下面的表达方式。

(http:\/\/.*?)['\"\< \>]


(http:\/\/[-a-zA-Z0-9+&@#\/%?=~_|!:,.;\"]*[-a-zA-Z0-9+&@#\/%=~_|\"])

第一个做得很好,但总是给出最后一个额外的字符和匹配的 URL。

例如:

http://domain.com/path.html" 

http://domain.com/path.html<

请注意

" <

,我不希望它们带有 URL。

I have tried the below expressions.

(http:\/\/.*?)['\"\< \>]


(http:\/\/[-a-zA-Z0-9+&@#\/%?=~_|!:,.;\"]*[-a-zA-Z0-9+&@#\/%=~_|\"])

The first one is doing well but always gives the last extra character with the matched URLs.

Eg:

http://domain.com/path.html" 

http://domain.com/path.html<

Notice

" <

I don't want them with URLs.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

撕心裂肺的伤痛 2024-09-16 10:21:23

您可以使用前瞻而不是使 ['\"\< >] 成为匹配的一部分,即:

(http:\/\/.*?)(?=['\"\< >])

一般来说,而 ab 匹配 ab (如果后面跟着 b)。

a匹配

  • code>, a(?=b)与 www.regular-expressions.info/lookaround.html" rel="nofollow noreferrer">regular-expressions.info/Lookarounds

相关问题


捕获组选项

Lookarounds 并非所有风格都支持。

一般来说,虽然 (a)b 仍然匹配 ab,但它也捕获 a<。 /code> 在第 1 组中。

参考文献

相关问题


否定字符类选项

根据需要,通常使用否定字符类比使用不情愿的 .*? (后跟前瞻来断言终止符模式)要好得多在这种情况下)。

让我们考虑匹配“AZZ 之间的所有内容”的问题。事实证明,这个规范是不明确的:我们将提出 3 种模式来执行此操作,并且它们将产生不同的匹配。哪一个是“正确的”取决于预期,而原始陈述中没有正确传达这一点。

我们使用以下内容作为输入:

eeAiiZooAuuZZeeeZZfff

我们使用 3 种不同的模式:

  • A(.*)ZZ 产生 1 个匹配:AiiZooAuuZZeeeZZ (如 ideone.com 上所见
    • 这是贪婪变体;第 1 组匹配并捕获iiZooAuuZZeee
  • A(.*?)ZZ 产生 1 个匹配:AiiZooAuuZZ (如 ideone.com 上所示
    • 这是不情愿的变体;第 1 组匹配并捕获iiZooAuu
  • A([^Z]*)ZZ 产生 1 个匹配:AuuZZ (如 ideone.com 上所示
    • 这是否定字符类变体;第 1 组匹配并捕获uu

以下是它们匹配内容的直观表示:

         ___n
        /   \              n = negated character class
eeAiiZooAuuZZeeeZZfff      r = reluctant
  \_________/r   /         g = greedy
   \____________/g

参考文献

相关问题

You can use lookahead instead of making ['\"\< >] part of your match, i.e.:

(http:\/\/.*?)(?=['\"\< >])

Generally speaking, whereas ab matches ab, a(?=b) matches a (if it's followed by b).

References

Related questions


Capturing group option

Lookarounds are not supported by all flavors. More widely supported are capturing groups.

Generally speaking, whereas (a)b still matches ab, it also captures a in group 1.

References

Related questions


Negated character class option

Depending on the need, often times using a negated character class is much better than using a reluctant .*? (followed by a lookahead to assert the terminator pattern in this case).

Let's consider the problem of matching "everything between A and ZZ". As it turns out, this specification is ambiguous: we will come up with 3 patterns that does this, and they will yield different matches. Which one is "correct" depends on the expectation, which is not properly conveyed in the original statement.

We use the following as input:

eeAiiZooAuuZZeeeZZfff

We use 3 different patterns:

  • A(.*)ZZ yields 1 match: AiiZooAuuZZeeeZZ (as seen on ideone.com)
    • This is the greedy variant; group 1 matched and captured iiZooAuuZZeee
  • A(.*?)ZZ yields 1 match: AiiZooAuuZZ (as seen on ideone.com)
    • This is the reluctant variant; group 1 matched and captured iiZooAuu
  • A([^Z]*)ZZ yields 1 match: AuuZZ (as seen on ideone.com)
    • This is the negated character class variant; group 1 matched and captured uu

Here's a visual representation of what they matched:

         ___n
        /   \              n = negated character class
eeAiiZooAuuZZeeeZZfff      r = reluctant
  \_________/r   /         g = greedy
   \____________/g

References

Related questions

殤城〤 2024-09-16 10:21:23

您需要使用“(?=regex)”(lookahead),它查找特定模式,但不将其包含在结果中:

http:\/\/.*?(?=['\"\< >])

You need to use "(?=regex)" (lookahead), which lookups a particular pattern, but doesn't include it in the result:

http:\/\/.*?(?=['\"\< >])
始终不够爱げ你 2024-09-16 10:21:23

嗯,我可能会简单地通过说“继续前进,直到得到不需要的字符”来做到这一点,如下所示:

http://[^'"< >]*

转义版本(基于 Q - 不确定这是什么引擎):

http:\/\/[^'\"\< >]*

但是 polygenelubricants 的前瞻解决方案是一种更灵活的方式,如果 URL 中可能包含其中一些字符(但不在末尾)。

Hmmm, I'd probably do this simply by saying "keep going until you get an unwanted character", like so:

http://[^'"< >]*

Escaped version (based on Q - not sure what engine this is):

http:\/\/[^'\"\< >]*

However the lookahead solution by polygenelubricants is a more flexible way, if you might have some of those characters in the URL (but not at the end).

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文