如何使用正则表达式过滤带有 url 的文本?

发布于 2024-11-28 02:38:29 字数 285 浏览 0 评论 0原文

我想过滤输入文本(如果其中包含 URL)。 我所说的 URL 是指与有效互联网地址相对应的所有内容,例如 www.example.comexample.comhttp://www.example。 comhttp://example.com/foo/bar

我想我必须使用正则表达式和 preg_match 函数,因此我需要正确的正则表达式模式来实现此目的。
如果有人能给我的话,我将非常感激。

I want to filter the input text if it's got A URL inside it.
By URL I mean that every thing that corresponds to a valid internet address like www.example.com, example.com, http://www.example.com, http://example.com/foo/bar.

I think I've gotta use regular expressions and the preg_match function so I need the correct regexp pattern for this purpose.
I'd be very grateful if anybody could give me that.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

§对你不离不弃 2024-12-05 02:38:29

本文有一个很好的用于匹配网址的正则表达式:http://daringfireball.net/2010/07/improved_regex_for_matching_urls

对于 PHP,您需要正确转义正则表达式,例如如下所示:

$text = "here is some text that contains a link to www.example.com, and it will be matched.";
preg_match("/(?i)\b((?:[a-z][\w-]+:(?:\/{1,3}|[a-z0-9%])|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}\/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'\".,<>?«»“”‘’]))/", $text, $matches);
var_dump($matches);

This article has a nice regex for matching urls: http://daringfireball.net/2010/07/improved_regex_for_matching_urls

For PHP you would need to escape the regex properly, for example like this:

$text = "here is some text that contains a link to www.example.com, and it will be matched.";
preg_match("/(?i)\b((?:[a-z][\w-]+:(?:\/{1,3}|[a-z0-9%])|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}\/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'\".,<>?«»“”‘’]))/", $text, $matches);
var_dump($matches);
瘫痪情歌 2024-12-05 02:38:29
$html = "http://www.scroogle.org
http://www.scroogle.org/
http://www.scroogle.org/index.html
http://www.scroogle.org/index.html?source=library
You can surf the internet anonymously at https://ssl.scroogle.org/cgi-bin/nbbwssl.cgi.";


preg_match_all('/\b((?P<protocol>https?|ftp):\/\/(?P<domain>[-A-Z0-9.]+)(?P<file>\/[-A-Z0-9+&@#\/%=~_|!:,.;]*)?(?P<parameters>\?[A-Z0-9+&@#\/%=~_|!:,.;]*)?)/i', $html, $urls, PREG_PATTERN_ORDER);
$urls = $urls[1][0];

匹配

http://www.scroogle.org

http://www.scroogle.org/

http:// www.scroogle.org/index.html

http://www.scroogle.org/index.html?source=library

您可以匿名上网: https://ssl.scroogle.org/cgi-bin/nbbwssl.cgi

要循环结果,您可以使用:

for ($i = 0; $i < count($urls[0]); $i++) {
    echo $urls[1][$i]."\n";
}

将输出:

http://www.scroogle.org
http://www.scroogle.org/
http://www.scroogle.org/index.html
http://www.scroogle.org/index.html?source=library
https://ssl.scroogle.org/cgi-bin/nbbwssl.cgi

干杯,洛布

$html = "http://www.scroogle.org
http://www.scroogle.org/
http://www.scroogle.org/index.html
http://www.scroogle.org/index.html?source=library
You can surf the internet anonymously at https://ssl.scroogle.org/cgi-bin/nbbwssl.cgi.";


preg_match_all('/\b((?P<protocol>https?|ftp):\/\/(?P<domain>[-A-Z0-9.]+)(?P<file>\/[-A-Z0-9+&@#\/%=~_|!:,.;]*)?(?P<parameters>\?[A-Z0-9+&@#\/%=~_|!:,.;]*)?)/i', $html, $urls, PREG_PATTERN_ORDER);
$urls = $urls[1][0];

Will match:

http://www.scroogle.org

http://www.scroogle.org/

http://www.scroogle.org/index.html

http://www.scroogle.org/index.html?source=library

You can surf the internet anonymously at https://ssl.scroogle.org/cgi-bin/nbbwssl.cgi.

To loop results you can use:

for ($i = 0; $i < count($urls[0]); $i++) {
    echo $urls[1][$i]."\n";
}

will output:

http://www.scroogle.org
http://www.scroogle.org/
http://www.scroogle.org/index.html
http://www.scroogle.org/index.html?source=library
https://ssl.scroogle.org/cgi-bin/nbbwssl.cgi

cheers, Lob

肩上的翅膀 2024-12-05 02:38:29

在这里找到:http://zenverse.net/php-function -to-auto-convert-url-into-hyperlink/

WordPress 的功能。

function _make_url_clickable_cb($matches) {
    $ret = '';
    $url = $matches[2];

    if ( empty($url) )
        return $matches[0];
    // removed trailing [.,;:] from URL
    if ( in_array(substr($url, -1), array('.', ',', ';', ':')) === true ) {
        $ret = substr($url, -1);
        $url = substr($url, 0, strlen($url)-1);
    }
    return $matches[1] . "<a href=\"$url\" rel=\"nofollow\">$url</a>" . $ret;
}

function _make_web_ftp_clickable_cb($matches) {
    $ret = '';
    $dest = $matches[2];
    $dest = 'http://' . $dest;

    if ( empty($dest) )
        return $matches[0];
    // removed trailing [,;:] from URL
    if ( in_array(substr($dest, -1), array('.', ',', ';', ':')) === true ) {
        $ret = substr($dest, -1);
        $dest = substr($dest, 0, strlen($dest)-1);
    }
    return $matches[1] . "<a href=\"$dest\" rel=\"nofollow\">$dest</a>" . $ret;
}

function _make_email_clickable_cb($matches) {
    $email = $matches[2] . '@' . $matches[3];
    return $matches[1] . "<a href=\"mailto:$email\">$email</a>";
}

function make_clickable($ret) {
    $ret = ' ' . $ret;
    // in testing, using arrays here was found to be faster
    $ret = preg_replace_callback('#([\s>])([\w]+?://[\w\\x80-\\xff\#$%&~/.\-;:=,?@\[\]+]*)#is', '_make_url_clickable_cb', $ret);
    $ret = preg_replace_callback('#([\s>])((www|ftp)\.[\w\\x80-\\xff\#$%&~/.\-;:=,?@\[\]+]*)#is', '_make_web_ftp_clickable_cb', $ret);
    $ret = preg_replace_callback('#([\s>])([.0-9a-z_+-]+)@(([0-9a-z-]+\.)+[0-9a-z]{2,})#i', '_make_email_clickable_cb', $ret);

    // this one is not in an array because we need it to run last, for cleanup of accidental links within links
    $ret = preg_replace("#(<a( [^>]+?>|>))<a [^>]+?>([^>]+?)</a></a>#i", "$1$3</a>", $ret);
    $ret = trim($ret);
    return $ret;
}

用法:

$string = 'I have some texts here and also links such as http://www.youtube.com , www.haha.com and [email protected]. They are ready to be replaced.';

echo make_clickable($string);

Found here: http://zenverse.net/php-function-to-auto-convert-url-into-hyperlink/

Functions from WordPress.

function _make_url_clickable_cb($matches) {
    $ret = '';
    $url = $matches[2];

    if ( empty($url) )
        return $matches[0];
    // removed trailing [.,;:] from URL
    if ( in_array(substr($url, -1), array('.', ',', ';', ':')) === true ) {
        $ret = substr($url, -1);
        $url = substr($url, 0, strlen($url)-1);
    }
    return $matches[1] . "<a href=\"$url\" rel=\"nofollow\">$url</a>" . $ret;
}

function _make_web_ftp_clickable_cb($matches) {
    $ret = '';
    $dest = $matches[2];
    $dest = 'http://' . $dest;

    if ( empty($dest) )
        return $matches[0];
    // removed trailing [,;:] from URL
    if ( in_array(substr($dest, -1), array('.', ',', ';', ':')) === true ) {
        $ret = substr($dest, -1);
        $dest = substr($dest, 0, strlen($dest)-1);
    }
    return $matches[1] . "<a href=\"$dest\" rel=\"nofollow\">$dest</a>" . $ret;
}

function _make_email_clickable_cb($matches) {
    $email = $matches[2] . '@' . $matches[3];
    return $matches[1] . "<a href=\"mailto:$email\">$email</a>";
}

function make_clickable($ret) {
    $ret = ' ' . $ret;
    // in testing, using arrays here was found to be faster
    $ret = preg_replace_callback('#([\s>])([\w]+?://[\w\\x80-\\xff\#$%&~/.\-;:=,?@\[\]+]*)#is', '_make_url_clickable_cb', $ret);
    $ret = preg_replace_callback('#([\s>])((www|ftp)\.[\w\\x80-\\xff\#$%&~/.\-;:=,?@\[\]+]*)#is', '_make_web_ftp_clickable_cb', $ret);
    $ret = preg_replace_callback('#([\s>])([.0-9a-z_+-]+)@(([0-9a-z-]+\.)+[0-9a-z]{2,})#i', '_make_email_clickable_cb', $ret);

    // this one is not in an array because we need it to run last, for cleanup of accidental links within links
    $ret = preg_replace("#(<a( [^>]+?>|>))<a [^>]+?>([^>]+?)</a></a>#i", "$1$3</a>", $ret);
    $ret = trim($ret);
    return $ret;
}

Usage:

$string = 'I have some texts here and also links such as http://www.youtube.com , www.haha.com and [email protected]. They are ready to be replaced.';

echo make_clickable($string);
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文