当前位置：文江博客话题详情

PHP 中图像链接的屏幕抓取

发布于 2024-09-10 06:56:28 字数 166 浏览 7 评论 0原文

我有一个网站，其中包含许多不同的产品页面，每个页面都有一定数量的所有页面上格式相同的图像。我希望能够截取每个页面的 url，以便我可以从每个页面检索每个图像的 url。这个想法是为每个页面创建一个由热链接图像组成的画廊。

我知道这可以在 php 中完成，但我不知道如何废弃多个链接的页面。有什么想法吗？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

〃安静 2024-09-17 06:56:28

我建议使用 DOM 解析器，例如 PHP 自己的 DOMDocument。例子：

$page = file_get_contents('http://example.com/images.php');
$doc = new DOMDocument(); 
$doc->loadHTML($page);
$images = $doc->getElementsByTagName('img'); 
foreach($images as $image) {
    echo $image->getAttribute('src') . '<br />';
}

I would recommend using a DOM parser, such as PHP's very own DOMDocument. Example:

$page = file_get_contents('http://example.com/images.php');
$doc = new DOMDocument(); 
$doc->loadHTML($page);
$images = $doc->getElementsByTagName('img'); 
foreach($images as $image) {
    echo $image->getAttribute('src') . '<br />';
}

回复收藏 0 原文

神仙妹妹 2024-09-17 06:56:28

您可以使用正则表达式（regex）来遍历页面源并解析所有IMG标签。

这个正则表达式可以很好地完成这项工作： ]+src="(.*?)"

这是如何工作的？

// <img[^>]+src="(.*?)"
// 
// Match the characters "<img" literally «<img»
// Match any character that is not a ">" «[^>]+»
//    Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
// Match the characters "src="" literally «src="»
// Match the regular expression below and capture its match into backreference number 1 «(.*?)»
//    Match any single character that is not a line break character «.*?»
//       Between zero and unlimited times, as few times as possible, expanding as needed (lazy) «*?»
// Match the character """ literally «"»

示例 PHP 代码：

preg_match_all('/<img[^>]+src="(.*?)"/i', $subject, $result, PREG_PATTERN_ORDER);
for ($i = 0; $i < count($result[0]); $i++) {
    // image URL is in $result[0][$i];
}

您需要做更多的工作来解决相对 URL 之类的问题。

You can use a regular expression (regex) to go through the page source and parse all the IMG tags.

This regex will do the job quite nicely: <img[^>]+src="(.*?)"

How does this work?

// <img[^>]+src="(.*?)"
// 
// Match the characters "<img" literally «<img»
// Match any character that is not a ">" «[^>]+»
//    Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
// Match the characters "src="" literally «src="»
// Match the regular expression below and capture its match into backreference number 1 «(.*?)»
//    Match any single character that is not a line break character «.*?»
//       Between zero and unlimited times, as few times as possible, expanding as needed (lazy) «*?»
// Match the character """ literally «"»

Sample PHP code:

preg_match_all('/<img[^>]+src="(.*?)"/i', $subject, $result, PREG_PATTERN_ORDER);
for ($i = 0; $i < count($result[0]); $i++) {
    // image URL is in $result[0][$i];
}

You'll have to do a bit more work to resolve things like relative URLs.

回复收藏 0 原文

雅心素梦 2024-09-17 06:56:28

我真的很喜欢 PHP Simple HTML DOM Parser 来完成这样的事情。首页上有一个抓取图像的示例：

// Create DOM from URL or file
$html = file_get_html('http://www.google.com/');

// Find all images
foreach($html->find('img') as $element)
       echo $element->src . '<br>';

I really like PHP Simple HTML DOM Parser for things like this. An example of grabbing images is right there on the front page:

// Create DOM from URL or file
$html = file_get_html('http://www.google.com/');

// Find all images
foreach($html->find('img') as $element)
       echo $element->src . '<br>';

回复收藏 0 原文