从字符串中提取 URL

发布于 2024-10-06 20:28:47 字数 184 浏览 3 评论 0原文

我正在尝试找到一个可靠的解决方案来从字符串中提取网址。我有一个网站,用户可以在其中回答问题,并在来源框中输入信息来源,我允许他们输入网址。我想提取该网址并将其设为超链接。类似于雅虎问答的做法。

有谁知道可以做到这一点的可靠解决方案?

我发现的所有解决方案都适用于某些 URL,但不适用于其他 URL。

谢谢

I'm trying to find a reliable solution to extract a url from a string of characters. I have a site where users answer questions and in the source box, where they enter their source of information, I allow them to enter a url. I want to extract that url and make it a hyperlink. Similar to how Yahoo Answers does it.

Does anyone know a reliable solution that can do this?

All the solutions I have found work for some URL's but not for others.

Thanks

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

红焚 2024-10-13 20:28:47

John Gruber 花费了大量时间完善链接的“一个正则表达式来统治它们”检测。正如其他答案中提到的,使用 preg_replace() ,使用以下正则表达式应该是检测链接的最准确(如果不是最准确)的方法之一:

(?i)\b((?:[a-z][\w-]+:(?:/{1,3}|[a-z0-9%])|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'".,<>?«»“”‘’]))

如果您只想匹配 HTTP/ HTTPS:

(?i)\b((?:https?://|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'".,<>?«»“”‘’]))

John Gruber has spent a fair amount of time perfecting the "one regex to rule them all" for link detection. Using preg_replace() as mentioned in the other answers, using the following regex should be one of the most accurate, if not the most accurate, method for detecting a link:

(?i)\b((?:[a-z][\w-]+:(?:/{1,3}|[a-z0-9%])|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'".,<>?«»“”‘’]))

If you only wanted to match HTTP/HTTPS:

(?i)\b((?:https?://|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'".,<>?«»“”‘’]))
狂之美人 2024-10-13 20:28:47
$string = preg_replace('/https?:\/\/[^\s"<>]+/', '<a href="$0" target="_blank">$0</a>', $string);

它只匹配 http/https,但这实际上是您想要转换为链接的唯一协议。如果你想要其他的,你可以这样改变:

$string = preg_replace('/(https?|ssh|ftp):\/\/[^\s"]+/', '<a href="$0" target="_blank">$0</a>', $string);
$string = preg_replace('/https?:\/\/[^\s"<>]+/', '<a href="$0" target="_blank">$0</a>', $string);

It only matches http/https, but that's really the only protocol you want to turn into a link. If you want others, you can change it like this:

$string = preg_replace('/(https?|ssh|ftp):\/\/[^\s"]+/', '<a href="$0" target="_blank">$0</a>', $string);
茶底世界 2024-10-13 20:28:47

url 有很多边缘情况。就像 url 可以包含括号或不包含协议等。这就是正则表达式不够的原因。

我创建了一个可以处理许多边缘情况的 PHP 库:网址突出显示

您可以从字符串中提取网址或直接突出显示它们。
示例:

<?php

use VStelmakh\UrlHighlight\UrlHighlight;

$urlHighlight = new UrlHighlight();

// Extract urls
$urlHighlight->getUrls("This is example http://example.com.");
// return: ['http://example.com']

// Make urls as hyperlinks
$urlHighlight->highlightUrls('Hello, http://example.com.');
// return: 'Hello, <a href="http://example.com">http://example.com</a>.'

有关更多详细信息,请参阅自述文件。有关涵盖的网址案例,请参阅测试

There are a lot of edge cases with urls. Like url could contain brackets or not contain protocol etc. Thats why regex is not enough.

I created a PHP library that could deal with lots of edge cases: Url highlight.

You could extract urls from string or directly highlight them.
Example:

<?php

use VStelmakh\UrlHighlight\UrlHighlight;

$urlHighlight = new UrlHighlight();

// Extract urls
$urlHighlight->getUrls("This is example http://example.com.");
// return: ['http://example.com']

// Make urls as hyperlinks
$urlHighlight->highlightUrls('Hello, http://example.com.');
// return: 'Hello, <a href="http://example.com">http://example.com</a>.'

For more details see readme. For covered url cases see test.

冰火雁神 2024-10-13 20:28:47

雅虎!当链接编写正确并与其他文本分开时,答案在链接识别方面做得相当好,但它不太擅长分隔尾随标点符号。例如链接是http://example.com/somepage.phphttp://example.com/somepage2.phphttp://example.com/somepage3.php. 将在前两个中包含逗号,在第三个中包含句点。

但如果这是可以接受的,那么像这样的模式应该可以做到:

\<http:[^ ]+\>

看起来 stackoverflow 的解析器更好。是开源的吗?

Yahoo! Answers does a fairly good job of link identification when the link is written properly and separate from other text, but it isn't very good at separating trailing punctuation. For example The links are http://example.com/somepage.php, http://example.com/somepage2.php, and http://example.com/somepage3.php. will include commas on the first two and a period on the third.

But if that is acceptable, then patterns like this should do it:

\<http:[^ ]+\>

It looks like stackoverflow's parser is better. Is is open source?

隐诗 2024-10-13 20:28:47

这段代码对我有用。

function makeLink($string){

/*** make sure there is an http:// on all URLs ***/
$string = preg_replace("/([^\w\/])(www\.[a-z0-9\-]+\.[a-z0-9\-]+)/i", "$1http://$2",$string);
/*** make all URLs links ***/
$string = preg_replace("/([\w]+:\/\/[\w-?&;#~=\.\/\@]+[\w\/])/i","<a target=\"_blank\" href=\"$1\">$1</a>",$string);
/*** make all emails hot links ***/
$string = preg_replace("/([\w-?&;#~=\.\/]+\@(\[?)[a-zA-Z0-9\-\.]+\.([a-zA-Z]{2,3}|[0-9]{1,3})(\]?))/i","<a href=\"mailto:$1\">$1</a>",$string);

return $string;
}

This code is worked for me.

function makeLink($string){

/*** make sure there is an http:// on all URLs ***/
$string = preg_replace("/([^\w\/])(www\.[a-z0-9\-]+\.[a-z0-9\-]+)/i", "$1http://$2",$string);
/*** make all URLs links ***/
$string = preg_replace("/([\w]+:\/\/[\w-?&;#~=\.\/\@]+[\w\/])/i","<a target=\"_blank\" href=\"$1\">$1</a>",$string);
/*** make all emails hot links ***/
$string = preg_replace("/([\w-?&;#~=\.\/]+\@(\[?)[a-zA-Z0-9\-\.]+\.([a-zA-Z]{2,3}|[0-9]{1,3})(\]?))/i","<a href=\"mailto:$1\">$1</a>",$string);

return $string;
}
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文