使用 preg_replace_callback() 从 HTML 字符串中提取所有图像

发布于 2024-10-27 21:33:22 字数 344 浏览 7 评论 0原文

这里有棘手的 preg_replace_callback 函数 - 不可否认,我不擅长 PRCE 表达式。

我试图从 HTML 字符串中提取所有 img src 值,将 img src 值保存到数组中,并另外将 img src 路径替换为本地路径(而不是远程路径)。即我可能有很多其他 HTML:

img src='http://www.mysite.com/folder/subfolder/images/myimage.png'

我想将 myimage.png 提取到一个数组中,另外将 src 更改为:

src='images/myimage.png'

可以吗?

谢谢

Tricky preg_replace_callback function here - I am admittedly not great at PRCE expressions.

I am trying to extract all img src values from a string of HTML, save the img src values to an array, and additionally replace the img src path to a local path (not a remote path). Ie I might have, surrounded by a lot of other HTML:

img src='http://www.mysite.com/folder/subfolder/images/myimage.png'

And I would want to extract myimage.png to an array, and additionally change the src to:

src='images/myimage.png'

Can that be done?

Thanks

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

没有伤那来痛 2024-11-03 21:33:22

是否需要使用正则表达式?使用 DOM 函数处理 HTML 通常更容易:

<?php

$domd = new DOMDocument();
libxml_use_internal_errors(true);
$domd->loadHTML(file_get_contents("http://stackoverflow.com"));
libxml_use_internal_errors(false);

$items = $domd->getElementsByTagName("img");
$data = array();

foreach($items as $item) {
  $data[] = array(
    "src" => $item->getAttribute("src"),
    "alt" => $item->getAttribute("alt"),
    "title" => $item->getAttribute("title"),
  );
}

print_r($data);

Does it need to use regular expressions? Handling HTML is normally easier with DOM functions:

<?php

$domd = new DOMDocument();
libxml_use_internal_errors(true);
$domd->loadHTML(file_get_contents("http://stackoverflow.com"));
libxml_use_internal_errors(false);

$items = $domd->getElementsByTagName("img");
$data = array();

foreach($items as $item) {
  $data[] = array(
    "src" => $item->getAttribute("src"),
    "alt" => $item->getAttribute("alt"),
    "title" => $item->getAttribute("title"),
  );
}

print_r($data);
花之痕靓丽 2024-11-03 21:33:22

你需要正则表达式吗?没有必要。正则表达式是最具可读性的解决方案吗?可能不会——至少除非你精通正则表达式。扫描大量数据时正则表达式是否更有效?当然,正则表达式在第一次出现时就会被编译和缓存。正则表达式赢得“最少行代码”奖杯吗?

$string = <<<EOS
<html>
<body>
blahblah<br>
<img src='http://www.mysite.com/folder/subfolder/images/myimage.png'>blah<br>
blah<img src='http://www.mysite.com/folder/subfolder/images/another.png' />blah<br>
</body>
</html>
EOS;

preg_match_all("%<img .*?src=['\"](.*?)['\"]%s", $string, $matches);
$images = array_map(function ($element) { return preg_replace("%^.*/(.*)$%", 'images/$1', $element); }, $matches[1]);

print_r($images);

两行代码,在 PHP 中很难削弱。它会生成以下 $images 数组:

Array
(
  [0] => images/myimage.png
  [1] => images/another.png
)

请注意,这不适用于 5.3 之前的 PHP 版本,除非您将匿名函数替换为正确的函数。

Do you need regex for this? Not necessary. Are regex the most readable solution? Probably not - at least unless you are fluent in regex. Are regex more efficient when scanning large amounts of data? Absolutely, the regex are compiled and cached upon first appearance. Do regex win the "least lines of code" trophy?

$string = <<<EOS
<html>
<body>
blahblah<br>
<img src='http://www.mysite.com/folder/subfolder/images/myimage.png'>blah<br>
blah<img src='http://www.mysite.com/folder/subfolder/images/another.png' />blah<br>
</body>
</html>
EOS;

preg_match_all("%<img .*?src=['\"](.*?)['\"]%s", $string, $matches);
$images = array_map(function ($element) { return preg_replace("%^.*/(.*)$%", 'images/$1', $element); }, $matches[1]);

print_r($images);

Two lines of code, that's hard to undercut in PHP. It results in the following $images array:

Array
(
  [0] => images/myimage.png
  [1] => images/another.png
)

Please note that this won't work with PHP versions prior to 5.3 unless you replace the anonymous function with a proper one.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文