当前位置：文江博客话题详情

PHP regex preg-replace

替换的正则表达式与各自的

发布于 2024-08-04 19:13:08 字数 480 浏览 5 评论 0 原文

我正在寻找 PHP preg_replace() 解决方案，找到图像链接并将其替换为相应的图像标签。

查找：

<a href="http://www.domain.tld/any/valid/path/to/imagefile.ext">This will be ignored.</a>

替换为：

<img src="http://www.domain.tld/any/valid/path/to/imagefile.ext" alt="imagefile" />

其中协议必须是 http://，.ext 必须是有效的图像格式（.jpg、.jpeg、.gif、.png、.tif），并且基本文件名变为 alt= ““ 价值。

我知道 preg_replace() 是适合这项工作的函数，但我对正则表达式很糟糕，所以非常感谢任何帮助！谢谢！

原文

I'm looking for a PHP preg_replace() solution find links to images and replace them with respective image tags.

Find:

<a href="http://www.domain.tld/any/valid/path/to/imagefile.ext">This will be ignored.</a>

Replace with:

<img src="http://www.domain.tld/any/valid/path/to/imagefile.ext" alt="imagefile" />

Where the protocol MUST be http://, the .ext MUST be a valid image format (.jpg, .jpeg, .gif, .png, .tif), and the base file name becomes the alt="" value.

I know preg_replace() is the right function for the job, but I suck with regex, so any help is greatly appreciated! THANKS!

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

执笏见 2024-08-11 19:13:08

恭喜，您是第一百万个向 Stack Overflow 询问如何使用正则表达式解析 HTML 的客户！

[X][HT]ML 不是常规语言，无法使用正则表达式可靠地进行解析。使用 HTML 解析器。 PHP 本身为您提供了 DOMDocument，或者您可能更喜欢 simplehtmldom。

顺便说一句，您无法通过查看文件的 URL 来判断文件的类型。 JPEG 没有理由必须以“.jpeg”作为扩展名 — 事实上，不能保证具有“.jpeg”扩展名的文件实际上就是 JPEG。唯一确定的方法是获取资源（例如使用 HEAD 请求）并查看 Content-Type 标头。

回复收藏 0 原文

治碍 2024-08-11 19:13:08

啊，我每天的 DOM 练习。您应该使用 DOM 来解析 HTML，并使用正则表达式来解析字符串，例如 html 属性。

注意：我有一些基本的正则表达式，肯定可以通过一些向导进行改进:)

注意＃2：虽然这可能会产生额外的开销，但您可以使用像curl这样的东西通过发送 HEAD 请求来彻底检查 href 是否是实际图像查看 Content-Type，但这适用于 80-90% 的情况。

<?php

$content = '

<a href="http://www.domain.tld/any/valid/path/to/imagefile.ext">This will be ignored.</a>
<br>

<a href="http://col.stb.s-msn.com/i/43/A4711309495C88F8CD154C99FCE.jpg">this will not be ignored</a>

<br>

<a href="http://col.stb.s-msn.com/i/A0/8E9A454F701E4F5F89E58E14B532C.jpg">bah</a>
';

$dom = new DOMDocument();
$dom->loadHTML($content);

$anchors = $dom->getElementsByTagName('a');

$i = $anchors->length-1;

$protocol = '/^http:\/\//';
$ext = '/([\w+]+)\.(?:gif|jpg|jpeg|png)$/';

if ( count($anchors->length) > 0 ) {
    while( $i > -1 ) {
    $anchor = $anchors->item($i);
    if ( $anchor->hasAttribute('href') ) {
        $link = $anchor->getAttribute('href');

        if ( 
        preg_match ( $protocol , $link ) &&
        preg_match ( $ext, $link )
        ) {
        //echo 'replacing this one.';
        $image = $dom->createElement('img');

        if ( preg_match( $ext, $link, $matches ) ) {
            if ( count($matches) ) {
            $altName = $matches[1];
            $image->setAttribute('alt', $altName);
            }
            $image->setAttribute('src', $link);
            $anchor->parentNode->replaceChild( $image, $anchor );
        }
        }

    }
    $i--;
    }
}

echo $dom->saveHTML();

Ahh, my daily DOM practice. You should use DOM to parse HTML and regex to parse strings such as html attributes.

Note: I have some basic regexes that could surely be improved upon by some wizards :)

Note #2: Though it might be extra overhead you could use something like curl to thoroughly check if the href is an actual image by sending a HEAD request and looking at the Content-Type, but this would work in 80-90% of cases.

<?php

$content = '

<a href="http://www.domain.tld/any/valid/path/to/imagefile.ext">This will be ignored.</a>
<br>

<a href="http://col.stb.s-msn.com/i/43/A4711309495C88F8CD154C99FCE.jpg">this will not be ignored</a>

<br>

<a href="http://col.stb.s-msn.com/i/A0/8E9A454F701E4F5F89E58E14B532C.jpg">bah</a>
';

$dom = new DOMDocument();
$dom->loadHTML($content);

$anchors = $dom->getElementsByTagName('a');

$i = $anchors->length-1;

$protocol = '/^http:\/\//';
$ext = '/([\w+]+)\.(?:gif|jpg|jpeg|png)$/';

if ( count($anchors->length) > 0 ) {
    while( $i > -1 ) {
    $anchor = $anchors->item($i);
    if ( $anchor->hasAttribute('href') ) {
        $link = $anchor->getAttribute('href');

        if ( 
        preg_match ( $protocol , $link ) &&
        preg_match ( $ext, $link )
        ) {
        //echo 'replacing this one.';
        $image = $dom->createElement('img');

        if ( preg_match( $ext, $link, $matches ) ) {
            if ( count($matches) ) {
            $altName = $matches[1];
            $image->setAttribute('alt', $altName);
            }
            $image->setAttribute('src', $link);
            $anchor->parentNode->replaceChild( $image, $anchor );
        }
        }

    }
    $i--;
    }
}

echo $dom->saveHTML();

回复收藏 0 原文

踏月而来 2024-08-11 19:13:08

我建议使用这个更灵活的非贪婪正则表达式：

<a[^>]+?href=\"(http:\/\/[^\"]+?\/([^\"]*?)\.(jpg|jpeg|png|gif))[^>]*?>[^<]*?<\/a>

和一个更复杂的正则表达式（包括 PHP 测试代码）希望能取悦 Gumbo :)

<?php
$test_data = <<<END
<a blabla="asldlsaj" alksjada="aslkdj" href="http://www.domain.tld/any/valid/path/to/imagefile.jpg" lkjasd=""asdlaskjd>This will be ignored.</a>
Lorem ipsum..
<a    blabla=asldlsaj alksjada="aslkdj" href="http://www.domain.tld/any/valid/path/to/imagefile.jpg" lkjasd=""asdlaskjd>This will be ignored.</a>
<a lkjafs='asdsa> ' blabla="asldlksjada=>"aslkdj" href="http://www.domain.tld/any/valid/path/to/imagefile.jpg" lkjasd=""asdlaskjd>This will be ignored.</a>
<a    blabla="ajada="aslk href="http://www.domain.tld/any/valid/path>/to/imagefile.jpg" lkjasd>asdlaskjd>This will be ignored.</a>
<a    blabla="asldlsaj>" aslkdj href="http://www.domain.tld/any/valid/path/ to/imagefile.jpg" lkjasd=""asdlaskjd>This will be ignored.</a>
Something:
<a    blabla='asldls<ajslkdj' href="http://www.domain.tld/any/valid'/path/to/imagefile.jpg" lkjasd=""asdlaskjd>This will be ignored.</a>
<a    blabla=  asldlsadj href="http://www.domain.tld/any/valid/path/to/imagefile.jpg" lkjasd>This will be ignored.</a>
<a blabla="asldlsaj" alksjslkdj" href='http://www.domain.tld/any/valid/path/to/imagefile.jpg' lkjasdskjd>This will be ignored.</a>
Something else...
<a    blabla="asldlsaj" alksjslkdj" href='http://www.domain.tld/any/valid/path/to/imagefile.jpg' lkjasdskjd>This will be ignored.</a>
<a    blabla="asldlsaj" alksjada="aslkdj" href=http://www.domain.tld/any/valid/path/to/imagefile.jpg lkjdlaskjdll> be ignored.</a>
END;
$regex = "/<a\s(\s*\w+(\s*=\s*(\".*?\"|'.*?'|[^'\">\s]+))?)+?\s+href\s*=\s*(\"(http:\/\/[^\"]+\/(.*?)\.(jpg|jpeg|png|gif))\"|'(http:\/\/[^']+\/(.*?)\.(jpg|jpeg|png|gif))'|(http:\/\/[^'\">\s]+\/([^'\">\s]+)\.(jpg|jpeg|png|gif)))\s(\s*\w+(\s*=\s*(\".*?\"|'.*?'|[^'\">\s]+))?)+>[^<]*?<\/a>/i";
$replaced = preg_replace($regex, '<img src="$5$8$11" alt="$6$9$12" />', $test_data);

echo '<pre>'.htmlentities($replaced);
?>

I would suggest using this more flexible non-greddy regex:

<a[^>]+?href=\"(http:\/\/[^\"]+?\/([^\"]*?)\.(jpg|jpeg|png|gif))[^>]*?>[^<]*?<\/a>

And a more complex regex (including PHP test code) to hopefully please Gumbo :)

<?php
$test_data = <<<END
<a blabla="asldlsaj" alksjada="aslkdj" href="http://www.domain.tld/any/valid/path/to/imagefile.jpg" lkjasd=""asdlaskjd>This will be ignored.</a>
Lorem ipsum..
<a    blabla=asldlsaj alksjada="aslkdj" href="http://www.domain.tld/any/valid/path/to/imagefile.jpg" lkjasd=""asdlaskjd>This will be ignored.</a>
<a lkjafs='asdsa> ' blabla="asldlksjada=>"aslkdj" href="http://www.domain.tld/any/valid/path/to/imagefile.jpg" lkjasd=""asdlaskjd>This will be ignored.</a>
<a    blabla="ajada="aslk href="http://www.domain.tld/any/valid/path>/to/imagefile.jpg" lkjasd>asdlaskjd>This will be ignored.</a>
<a    blabla="asldlsaj>" aslkdj href="http://www.domain.tld/any/valid/path/ to/imagefile.jpg" lkjasd=""asdlaskjd>This will be ignored.</a>
Something:
<a    blabla='asldls<ajslkdj' href="http://www.domain.tld/any/valid'/path/to/imagefile.jpg" lkjasd=""asdlaskjd>This will be ignored.</a>
<a    blabla=  asldlsadj href="http://www.domain.tld/any/valid/path/to/imagefile.jpg" lkjasd>This will be ignored.</a>
<a blabla="asldlsaj" alksjslkdj" href='http://www.domain.tld/any/valid/path/to/imagefile.jpg' lkjasdskjd>This will be ignored.</a>
Something else...
<a    blabla="asldlsaj" alksjslkdj" href='http://www.domain.tld/any/valid/path/to/imagefile.jpg' lkjasdskjd>This will be ignored.</a>
<a    blabla="asldlsaj" alksjada="aslkdj" href=http://www.domain.tld/any/valid/path/to/imagefile.jpg lkjdlaskjdll> be ignored.</a>
END;
$regex = "/<a\s(\s*\w+(\s*=\s*(\".*?\"|'.*?'|[^'\">\s]+))?)+?\s+href\s*=\s*(\"(http:\/\/[^\"]+\/(.*?)\.(jpg|jpeg|png|gif))\"|'(http:\/\/[^']+\/(.*?)\.(jpg|jpeg|png|gif))'|(http:\/\/[^'\">\s]+\/([^'\">\s]+)\.(jpg|jpeg|png|gif)))\s(\s*\w+(\s*=\s*(\".*?\"|'.*?'|[^'\">\s]+))?)+>[^<]*?<\/a>/i";
$replaced = preg_replace($regex, '<img src="$5$8$11" alt="$6$9$12" />', $test_data);

echo '<pre>'.htmlentities($replaced);
?>

回复收藏 0 原文

~没有更多了~

关于作者

慕巷

暂无简介

0 文章

0 评论

24 人气

关注发私信

友情链接

文江博客

替换的正则表达式与各自的

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（3）

关于作者

相关话题

热门标签

推荐作者

謌踐踏愛綪

开始看清了

高速公鹿

alipaysp_PLnULTzf66

热情消退

白色月光

友情链接

替换 的正则表达式与各自的

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（3）

关于作者

相关话题

热门标签

推荐作者

謌踐踏愛綪

开始看清了

高速公鹿

alipaysp_PLnULTzf66

热情消退

白色月光

友情链接

替换的正则表达式与各自的

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。