将 URL 转换为 HTML 中的超链接，无需替换 src 属性值

发布于 2024-09-18 08:19:22 字数 270 浏览 6 评论 0原文

我正在尝试转换 URL，但如果它们位于 src=" 之后，则不会。到目前为止，我有这个...

return preg_replace('@(?!^src=")(https?://([-\w\.]+)+(:\d+)?(/([\w/_\.-]*(\?\S+)?)?)?)@', '<a href="$1" target="_blank">$1</a>', $s);

它会转换 URL，但即使它位于 src=" 之前。

原文

I'm trying to convert URLs, but not if they come after src=". So far, I have this...

return preg_replace('@(?!^src=")(https?://([-\w\.]+)+(:\d+)?(/([\w/_\.-]*(\?\S+)?)?)?)@', '<a href="$1" target="_blank">$1</a>', $s);

It converts the URL, but even if it is before src=".

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

老娘不死你永远是小三 2024-09-25 08:19:22

使其成为背后断言。

(?<!^src=")

Make that a lookbehind assertion.

(?<!^src=")

回复收藏 0 原文

草莓酥 2024-09-25 08:19:22

我必须在没有最小可验证示例的情况下推断出此任务的意图。

通过利用合法的 DOM 解析器，您可以在很大程度上防止匹配包含其他合格 URL 值的非文本节点。

下面使用 XPath 查询来防止匹配已经是标记子级的 URL 值。仅以 text() 为目标，就无法替换标签属性值。

接下来是循环文本节点时的一些巧妙的魔法。

使用 preg_match_all() 隔离每个文本节点中的一个或多个节点 URL，然后创建一个新的元素来替换相应的文本 URL 段。

使用splitText()“吐出”URL 之前的文本前导部分——它将成为当前节点之前的一个新节点。

使用 replace_child() 将剩余文本替换为新的节点。

使用 insertBefore() 将最初跟随在 URL 文本后面的文本作为新文本节点添加到前面。

代码：（演示）

$html = <<<HTML
<div>
Some text <img src="https://example.com/foo?bar=food"> and
a raw link http://example.com/number2 then
<a href="https://example.com/the/third/url">original text</a> ...
and <p>here's another HTTPS://www.example.net/booyah</p> and done
</div>
HTML;

$dom = new DOMDocument();
$dom->loadHTML($html, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
$xpath = new DOMXPath($dom);
$regex = '#\bhttps?://[-\w.]+(?::\d+)?(?:/(?:[\w/_.-]*(?:\?\S+)?)?)?#ui';
foreach ($xpath->query('//*[not(self::a)]/text()') as $textNode) {
    $text = $textNode->nodeValue;
    foreach (preg_match_all($regex, $text, $m) ? $m[0] : [] as $url) {
        $a = $dom->createElement('a', htmlspecialchars($url));
        $a->setAttribute('href', $url);
        $mbPosOfUrlInText = mb_strpos($text, $url);
        // regurgitate any leading text as a new preceding node
        // then replace remainder of text with new hyperlink
        $textNode->parentNode->replaceChild(
            $a,
            $textNode->splitText($mbPosOfUrlInText)
        );
        // add any text after url as new text node after new hyperlink
        $textNode->parentNode->insertBefore(
            $dom->createTextNode(
                mb_substr($text, $mbPosOfUrlInText + mb_strlen($url))
            ),
            $a->nextSibling
        );
    }
}
echo $dom->saveHTML();

输出：

<div>
Some text <img src="https://example.com/foo?bar=food"> and
a raw link <a href="http://example.com/number2">http://example.com/number2</a> then
<a href="https://example.com/the/third/url">original text</a> ...
and <p>here's another <a href="HTTPS://www.example.net/booyah">HTTPS://www.example.net/booyah</a></p> and done
</div>

I must infer the intent of this task in the absence of a minimal verifiable example.

By leveraging a legitimate DOM parser, you can largely prevent the matching of non-text nodes which contain otherwise qualifying URL values.

Below uses an XPath query to prevent matching the URL value which is already the child of an <a> tag. By only targeting text(), there is no chance of replacing tag attribute values.

What comes next is some of the clever magic while looping over the text nodes.

Use preg_match_all() to isolate one or more nodes URLs in each text node, then create a new <a> element to replace the respective URL segment of text.

Use splitText() to "spit out" the leading portion of text before the URL -- it will become a new node prior to the current node.

Use replace_child() to replace the remaining text with the new <a> node.

Use insertBefore() to prepend the text that originally followed the URL text as a new text node.

Code: (Demo)

$html = <<<HTML
<div>
Some text <img src="https://example.com/foo?bar=food"> and
a raw link http://example.com/number2 then
<a href="https://example.com/the/third/url">original text</a> ...
and <p>here's another HTTPS://www.example.net/booyah</p> and done
</div>
HTML;

$dom = new DOMDocument();
$dom->loadHTML($html, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
$xpath = new DOMXPath($dom);
$regex = '#\bhttps?://[-\w.]+(?::\d+)?(?:/(?:[\w/_.-]*(?:\?\S+)?)?)?#ui';
foreach ($xpath->query('//*[not(self::a)]/text()') as $textNode) {
    $text = $textNode->nodeValue;
    foreach (preg_match_all($regex, $text, $m) ? $m[0] : [] as $url) {
        $a = $dom->createElement('a', htmlspecialchars($url));
        $a->setAttribute('href', $url);
        $mbPosOfUrlInText = mb_strpos($text, $url);
        // regurgitate any leading text as a new preceding node
        // then replace remainder of text with new hyperlink
        $textNode->parentNode->replaceChild(
            $a,
            $textNode->splitText($mbPosOfUrlInText)
        );
        // add any text after url as new text node after new hyperlink
        $textNode->parentNode->insertBefore(
            $dom->createTextNode(
                mb_substr($text, $mbPosOfUrlInText + mb_strlen($url))
            ),
            $a->nextSibling
        );
    }
}
echo $dom->saveHTML();

Output:

<div>
Some text <img src="https://example.com/foo?bar=food"> and
a raw link <a href="http://example.com/number2">http://example.com/number2</a> then
<a href="https://example.com/the/third/url">original text</a> ...
and <p>here's another <a href="HTTPS://www.example.net/booyah">HTTPS://www.example.net/booyah</a></p> and done
</div>

回复收藏 0 原文

~没有更多了~