当前位置：文江博客话题详情

PHP URL prettify

PHP：使用查找、替换、缩短和美化用户链接标签、省略号和链接图标

发布于 2024-10-01 05:01:50 字数 685 浏览 12 评论 0 原文

当用户输入 URL（例如 http://www.google.com）时，我希望能够使用 PHP 解析该文本，找到任何链接、并将其替换为标记，其中包含原始 URL 作为 HREF。

换句话说，http://www.google.com 将变为

http://www.google .com

我希望能够对这些形式的所有 URL 执行此操作（.com 可与任何 TLD 互换）：

http://www.google.com
www.google.com
google.com
docs.google.com

什么是最有效的方法这？我可以尝试编写一些非常奇特的正则表达式，但我怀疑这对我来说是最好的方法。

为了获得奖励积分，我还想在任何缺少它的 URL 前面添加 http:// ，并将显示文本本身剥离为 http://www.google 形式的内容.com/reallyLongL... 并随后显示外部链接图标。

原文

When a user enters a URL, e.g. http://www.google.com, I would like to be able to parse that text using PHP, find any links, and replace them with <a> tags that include the original URL as an HREF.

In other words, http://www.google.com will become

<a href="http://www.google.com">http://www.google.com</a>

I'd like to be able to do this for all URLs of these forms (with .com interchangeable with any TLD):

http://www.google.com
www.google.com
google.com
docs.google.com

What's the most performant way to do this? I could try writing some really fancy regex, but I doubt that's the best method available to me.

For bonus points, I'd also like to prepend http:// to any URL lacking it, and strip the display text itself down to something of the form http://www.google.com/reallyLongL... and display an external link icon afterwards.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

巡山小妖精 2024-10-08 05:01:50

尝试查找domain.com 格式的链接将是一件很痛苦的事情。这需要跟踪所有 TLD 并在搜索中使用它们。如果您没有输入我输入的最后一句话的结尾，并且该句子的开头将是指向 http://search.if。即使您这样做了，.in 也是有效的 TLD 和常用词。

我建议告诉您的用户他们必须以 www. 或 http:// 开头链接，然后编写一个简单的正则表达式来捕获它们并添加链接。

回复收藏 0 原文

二货你真萌 2024-10-08 05:01:50

www.google.com

这不是 URL，而是主机名。在任意文本中开始标记裸主机名通常不是一个好主意，因为在一般情况下，任何单词或点分隔单词序列都是完全有效的主机名。这意味着您会遇到可怕的黑客攻击，例如寻找领先的 www. （并且您会得到诸如“为什么我可以链接到 www.stackoverflow.com 但不能链接到 >stackoverflow.com？”）或尾随 TLD（随着更多新 TLD 的引入，这变得越来越不切实际；“为什么我喜欢 ncm.com 而不是 ncm.museum？ ”），并且您经常会标记不应该是链接的内容。

我可以尝试编写一些非常奇特的正则表达式

，但是我不知道如果没有正则表达式，您将如何做到这一点。

诀窍是处理标记。如果输入中可以包含 <、& 和 " 字符，则不得让它们进入 HTML 输出。如果您的输入是纯文本，您可以通过调用 htmlspecialchars() 来实现这一点，然后再对 nico 的答案中的模式进行简单的替换

（如果输入已经包含标记，那么您就会遇到问题。可能需要一个 HTML 解析器来确定哪些位是标记，以避免在其中添加更多标记。类似地，如果您在此之后进行更多处理，插入更多标记，那么这些步骤在“bbcode”中可能会遇到同样的困难。就像语言一样，这通常会导致错误和安全问题。）

另一个问题是尾随标点符号人们通常在链接后放置句号、逗号、右括号、感叹号等，这些不应该是链接的一部分。链接，但实际上是有效的字符，将它们删除而不将它们放入链接中是有用的，但是您会破坏以 ) 结尾的 Wiki 链接，因此您可能不想处理 。如果链接中有 ( 或类似的内容，则将 ) 作为尾随字符。这种事情不能通过简单的正则表达式替换来完成，但可以在替换回调函数中完成。

回复收藏 0 原文

流年已逝 2024-10-08 05:01:50

HTML Purifier 有一个内置 linkify 功能，为您省去所有麻烦。

如果您正在处理还必须显示的任何类型的用户输入，它的其他功能也非常有用，不容错过。

回复收藏 0 原文

沉默的熊 2024-10-08 05:01:50

不那么花哨的正则表达式应该可以工作

/\b(https?:\/\/[^\s+\"\<\>]+)/ig
/\b(www.[^\s+\"\<\>]+)/ig

请注意，最后两个不可能正确执行，因为你无法区分 google.com 和这样的东西。我完成一个句子，并且在句号后不加空格。

至于缩短 URL，请将 URL 放在 $url 中：

if (strlen($url) > 20) // Or whatever length you like
   {
   $shortURL = substr($url, 0, 20)."…";
   }
else
   {
   $shortURL = $url;
   }

echo '<a href="'.$url.'" >'.$shortURL.'</a>';

Not so fancy regexps that should work

/\b(https?:\/\/[^\s+\"\<\>]+)/ig
/\b(www.[^\s+\"\<\>]+)/ig

Note that the last two would be impossible to do correctly as you cannot distinguish google.com from something like this.Where I finish one sentence and don't put a space after the full stop.

As for shortening the URLs, having your URL in $url:

if (strlen($url) > 20) // Or whatever length you like
   {
   $shortURL = substr($url, 0, 20)."…";
   }
else
   {
   $shortURL = $url;
   }

echo '<a href="'.$url.'" >'.$shortURL.'</a>';

回复收藏 0 原文

俏︾媚 2024-10-08 05:01:50

来自 http://www.exorithm.com/algorithm/view/markup_urls

function markup_urls ($text)
{
  // split the text into words
  $words = preg_split('/([\s\n\r]+)/', $text, -1, PREG_SPLIT_DELIM_CAPTURE);
  $text = "";

  // iterate through the words
  foreach($words as $word) {

    // chopword = the portion of the word that will be replaced
    $chopword = $word;
    $chopword = preg_replace('/^[^A-Za-z0-9]*/', '', $chopword);

    if ($chopword <> '') {
      // linkword = the text that will replace chopword in the word
      $linkword='';

      // does it start with http://abc. ?
      if (preg_match('/^(http:\/\/)[a-zA-Z0-9_]{2,}.*/', $chopword)) {

        $chopword = preg_replace('/[^A-Za-z0-9\/]*$/', '', $chopword);
        $linkword = '<a href="'.$chopword.'" target="blank">'.$chopword.'</a>';

      // does it equal abc.def.ghi ?
      } else if (preg_match('/^[a-zA-Z]{2,}\.([a-zA-Z0-9_]+\.)+[a-zA-Z]{2,}(\/.*)?/', $chopword)) {

        $chopword = preg_replace('/[^A-Za-z0-9\/]*$/', '', $chopword);
        $linkword = '<a href="http://'.$chopword.'" target="blank">'.$chopword.'</a>';

      // does it start with [email protected] ?
      } else if (preg_match('/^[a-zA-Z0-9_\.]+\@([a-zA-Z0-9_]{2,}\.)+[a-zA-Z]{2,}.*/', $chopword)) {

        $chopword = preg_replace('/[^A-Za-z0-9]*$/', '', $chopword);
        $linkword = '<a href="mailto:'.$chopword.'">'.$chopword.'</a>';

      }

      // replace chopword with linkword in word (if linkword was set)
      if ($linkword <> '') {
        $word = str_replace($chopword, $linkword, $word);
      }
    }

    // append the word
    $text = $text.$word;
  }

  return $text;
}

From http://www.exorithm.com/algorithm/view/markup_urls

function markup_urls ($text)
{
  // split the text into words
  $words = preg_split('/([\s\n\r]+)/', $text, -1, PREG_SPLIT_DELIM_CAPTURE);
  $text = "";

  // iterate through the words
  foreach($words as $word) {

    // chopword = the portion of the word that will be replaced
    $chopword = $word;
    $chopword = preg_replace('/^[^A-Za-z0-9]*/', '', $chopword);

    if ($chopword <> '') {
      // linkword = the text that will replace chopword in the word
      $linkword='';

      // does it start with http://abc. ?
      if (preg_match('/^(http:\/\/)[a-zA-Z0-9_]{2,}.*/', $chopword)) {

        $chopword = preg_replace('/[^A-Za-z0-9\/]*$/', '', $chopword);
        $linkword = '<a href="'.$chopword.'" target="blank">'.$chopword.'</a>';

      // does it equal abc.def.ghi ?
      } else if (preg_match('/^[a-zA-Z]{2,}\.([a-zA-Z0-9_]+\.)+[a-zA-Z]{2,}(\/.*)?/', $chopword)) {

        $chopword = preg_replace('/[^A-Za-z0-9\/]*$/', '', $chopword);
        $linkword = '<a href="http://'.$chopword.'" target="blank">'.$chopword.'</a>';

      // does it start with [email protected] ?
      } else if (preg_match('/^[a-zA-Z0-9_\.]+\@([a-zA-Z0-9_]{2,}\.)+[a-zA-Z]{2,}.*/', $chopword)) {

        $chopword = preg_replace('/[^A-Za-z0-9]*$/', '', $chopword);
        $linkword = '<a href="mailto:'.$chopword.'">'.$chopword.'</a>';

      }

      // replace chopword with linkword in word (if linkword was set)
      if ($linkword <> '') {
        $word = str_replace($chopword, $linkword, $word);
      }
    }

    // append the word
    $text = $text.$word;
  }

  return $text;
}

回复收藏 0 原文

不爱素颜 2024-10-08 05:01:50

我在这里完全按照我想要的方式工作：

<?php

$input = <<<EOF
http://www.example.com/
http://example.com
www.example.com
http://iamanextremely.com/long/link/so/I/will/be/trimmed/down/a/bit/so/i/dont/mess
/up/text/wrapping.html
EOF;

  function trimlong($match)
  {
    $url = $match[0];
    $display = $url;
    if ( strlen($display) > 30 ) {
      $display = substr($display,0,30)."...";
    }
    return '<a href="'.$url.'">'.$display.' <img src="http://static.goalscdn.com/img/external-link.gif" height="10" width="11" /></a>';
  }

$output = preg_replace_callback('#(http://|www\\.)[^\\s<]+[^\\s<,.]#i',
                                 array($this,'trimlong'),$input);

echo $output;

I got this working exactly the way I want here:

<?php

$input = <<<EOF
http://www.example.com/
http://example.com
www.example.com
http://iamanextremely.com/long/link/so/I/will/be/trimmed/down/a/bit/so/i/dont/mess
/up/text/wrapping.html
EOF;

  function trimlong($match)
  {
    $url = $match[0];
    $display = $url;
    if ( strlen($display) > 30 ) {
      $display = substr($display,0,30)."...";
    }
    return '<a href="'.$url.'">'.$display.' <img src="http://static.goalscdn.com/img/external-link.gif" height="10" width="11" /></a>';
  }

$output = preg_replace_callback('#(http://|www\\.)[^\\s<]+[^\\s<,.]#i',
                                 array($this,'trimlong'),$input);

echo $output;

回复收藏 0 原文

~没有更多了~