正则表达式检测包含其他 url 的 url
XXXXXXhttp://something/something-http://directedto.com/XXXXXXX
我有一个类似的字符串列表,其中 X 代表随机扩展 ASCII 字符。我找不到任何可以帮助我获取正则表达式的网络源
出字符串。您能为我提供一个真正有帮助的正则表达式模式吗?
编辑;上面的字符串只是一个例子。
与其他情况一样,例如
XXXXXhttp://something/somehttp/qausiehfiuhakjh-/http://directedto.net/soemthignelseXXXXXXX XXXXXXXXXXhttp://www.yahoo.com/_ylt=Asq0NTMqTVFcCmnB3eR857SbvZx4;_ylu=X3oDMTNvZ2dtNnI1BGEDMQRjY29kZQNwemJ1Y WxsY2FonQRjcG9zAzIEZwMxBGludGwDdXMEbWNvZGUDchpidWFsbGNhaDUEbXBvcwMzBHBrZ3QDMgRwb3MDMQRzZWMDdGQtbG9jBHNsawN0 aXRsZQR0ZXN0AzcwMQR3b2UDMjQ1OTExNQ--/SIG=14l1h2t2v/EXP=1322779228/**http://www.nytimes.com/2011/12/01/nyreg ion/told-to-diversify-dock-union-offers-nearly-all-white-list.html%3Fsrc=me%26ref=nyregionXXXXXXXXXXXXXXX
XXXXXXhttp://something/something-http://directedto.com/XXXXXXX
I have a list of strings like that where X stands for a random extended ASCII character. I can't find any web source of regex that help me to get
out of the string. Could you provide me a regex pattern that really helps ?
EDIT; the above string is just an example.
as other cases e.g
XXXXXhttp://something/somehttp/qausiehfiuhakjh-/http://directedto.net/soemthignelseXXXXXXX
XXXXXXXXXXhttp://www.yahoo.com/_ylt=Asq0NTMqTVFcCmnB3eR857SbvZx4;_ylu=X3oDMTNvZ2dtNnI1BGEDMQRjY29kZQNwemJ1YWxsY2FoNQRjcG9zAzIEZwMxBGludGwDdXMEbWNvZGUDcHpidWFsbGNhaDUEbXBvcwMzBHBrZ3QDMgRwb3MDMQRzZWMDdGQtbG9jBHNsawN0aXRsZQR0ZXN0AzcwMQR3b2UDMjQ1OTExNQ--/SIG=14l1h2t2v/EXP=1322779228/**http://www.nytimes.com/2011/12/01/nyregion/told-to-diversify-dock-union-offers-nearly-all-white-list.html%3Fsrc=me%26ref=nyregionXXXXXXXXXXXXXX
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
检测 URL 实际上非常困难,因为它几乎可以包含任何字符,包括“随机扩展 ascii”字符。为什么这么难的一个很好的解释是:http://daringfireball.net/2010/07/improved_regex_for_matching_urls。不幸的是,该示例假设 URL 周围存在某种“单词边界”,但您的问题并非如此。
没有任何方法可以可靠地检测任何可能的 url,但您可以做出一些假设,也许您的 url 全部以“http:”或“https:”开头,并且只包含字母数字字符、下划线和句点?这将适用于:
如果您使用您尝试搜索的实际文本的更好示例更新您的问题,我可以根据需要改进我的模式。
Detecting a URL is actually very difficult, because it can contain almost any character including "random extended ascii" ones. A good explanation of why it's so hard is here: http://daringfireball.net/2010/07/improved_regex_for_matching_urls. Unfortunately that example assumes there is some kind of "word boundary" around the URL, which is not the case for your problem.
There isn't any way to reliably detect any possible url, but you could make some assumptions, perhaps your urls all start with 'http:' or 'https:' and only contain alpha-numeric characters, underscores and periods? This would work for that:
If you update your question with better examples of the actual text you're trying to search in, I can improve my pattern as necessary.