正则表达式检测包含其他 url 的 url

发布于 2024-12-19 06:47:27 字数 966 浏览 0 评论 0原文

XXXXXXhttp://something/something-http://directedto.com/XXXXXXX

我有一个类似的字符串列表，其中 X 代表随机扩展 ASCII 字符。我找不到任何可以帮助我获取正则表达式的网络源

http://something/something-http://directedto.com/

出字符串。您能为我提供一个真正有帮助的正则表达式模式吗？

编辑;上面的字符串只是一个例子。
与其他情况一样，例如

XXXXXhttp://something/somehttp/qausiehfiuhakjh-/http://directedto.net/soemthignelseXXXXXXX XXXXXXXXXXhttp://www.yahoo.com/_ylt=Asq0NTMqTVFcCmnB3eR857SbvZx4;_ylu=X3oDMTNvZ2dtNnI1BGEDMQRjY29kZQNwemJ1Y WxsY2FonQRjcG9zAzIEZwMxBGludGwDdXMEbWNvZGUDchpidWFsbGNhaDUEbXBvcwMzBHBrZ3QDMgRwb3MDMQRzZWMDdGQtbG9jBHNsawN0 aXRsZQR0ZXN0AzcwMQR3b2UDMjQ1OTExNQ--/SIG=14l1h2t2v/EXP=1322779228/**http://www.nytimes.com/2011/12/01/nyreg ion/told-to-diversify-dock-union-offers-nearly-all-white-list.html%3Fsrc=me%26ref=nyregionXXXXXXXXXXXXXXX

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

赴月观长安 2024-12-26 06:47:27

检测 URL 实际上非常困难，因为它几乎可以包含任何字符，包括“随机扩展 ascii”字符。为什么这么难的一个很好的解释是：http://daringfireball.net/2010/07/improved_regex_for_matching_urls。不幸的是，该示例假设 URL 周围存在某种“单词边界”，但您的问题并非如此。

没有任何方法可以可靠地检测任何可能的 url，但您可以做出一些假设，也许您的 url 全部以“http:”或“https:”开头，并且只包含字母数字字符、下划线和句点？这将适用于：

https?:[a-zA-Z0-9./]+

如果您使用您尝试搜索的实际文本的更好示例更新您的问题，我可以根据需要改进我的模式。

Detecting a URL is actually very difficult, because it can contain almost any character including "random extended ascii" ones. A good explanation of why it's so hard is here: http://daringfireball.net/2010/07/improved_regex_for_matching_urls. Unfortunately that example assumes there is some kind of "word boundary" around the URL, which is not the case for your problem.

There isn't any way to reliably detect any possible url, but you could make some assumptions, perhaps your urls all start with 'http:' or 'https:' and only contain alpha-numeric characters, underscores and periods? This would work for that: