PHP 中图像链接的屏幕抓取

发布于 2024-09-10 06:56:28 字数 166 浏览 0 评论 0原文

我有一个网站,其中包含许多不同的产品页面,每个页面都有一定数量的所有页面上格式相同的图像。我希望能够截取每个页面的 url,以便我可以从每个页面检索每个图像的 url。这个想法是为每个页面创建一个由热链接图像组成的画廊。

我知道这可以在 php 中完成,但我不知道如何废弃多个链接的页面。有什么想法吗?

I have a website that contains many different pages of products and each page has a certain amount of images in the same format across all pages. I want to be able to screen scrap each page's url so I can retrieve the url of each image from each page. The idea is to make a gallery for each page made up of hotlinked images.

I know this can be done in php, but I am not sure how to scrap the page for multiple links. Any ideas?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

〃安静 2024-09-17 06:56:28

我建议使用 DOM 解析器,例如 PHP 自己的 DOMDocument。例子:

$page = file_get_contents('http://example.com/images.php');
$doc = new DOMDocument(); 
$doc->loadHTML($page);
$images = $doc->getElementsByTagName('img'); 
foreach($images as $image) {
    echo $image->getAttribute('src') . '<br />';
}

I would recommend using a DOM parser, such as PHP's very own DOMDocument. Example:

$page = file_get_contents('http://example.com/images.php');
$doc = new DOMDocument(); 
$doc->loadHTML($page);
$images = $doc->getElementsByTagName('img'); 
foreach($images as $image) {
    echo $image->getAttribute('src') . '<br />';
}
神仙妹妹 2024-09-17 06:56:28

您可以使用正则表达式(regex)来遍历页面源并解析所有IMG标签。

这个正则表达式可以很好地完成这项工作: ]+src="(.*?)"

这是如何工作的?

// <img[^>]+src="(.*?)"
// 
// Match the characters "<img" literally «<img»
// Match any character that is not a ">" «[^>]+»
//    Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
// Match the characters "src="" literally «src="»
// Match the regular expression below and capture its match into backreference number 1 «(.*?)»
//    Match any single character that is not a line break character «.*?»
//       Between zero and unlimited times, as few times as possible, expanding as needed (lazy) «*?»
// Match the character """ literally «"»

示例 PHP 代码:

preg_match_all('/<img[^>]+src="(.*?)"/i', $subject, $result, PREG_PATTERN_ORDER);
for ($i = 0; $i < count($result[0]); $i++) {
    // image URL is in $result[0][$i];
}

您需要做更多的工作来解决相对 URL 之类的问题。

You can use a regular expression (regex) to go through the page source and parse all the IMG tags.

This regex will do the job quite nicely: <img[^>]+src="(.*?)"

How does this work?

// <img[^>]+src="(.*?)"
// 
// Match the characters "<img" literally «<img»
// Match any character that is not a ">" «[^>]+»
//    Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
// Match the characters "src="" literally «src="»
// Match the regular expression below and capture its match into backreference number 1 «(.*?)»
//    Match any single character that is not a line break character «.*?»
//       Between zero and unlimited times, as few times as possible, expanding as needed (lazy) «*?»
// Match the character """ literally «"»

Sample PHP code:

preg_match_all('/<img[^>]+src="(.*?)"/i', $subject, $result, PREG_PATTERN_ORDER);
for ($i = 0; $i < count($result[0]); $i++) {
    // image URL is in $result[0][$i];
}

You'll have to do a bit more work to resolve things like relative URLs.

雅心素梦 2024-09-17 06:56:28

我真的很喜欢 PHP Simple HTML DOM Parser 来完成这样的事情。首页上有一个抓取图像的示例:

// Create DOM from URL or file
$html = file_get_html('http://www.google.com/');

// Find all images
foreach($html->find('img') as $element)
       echo $element->src . '<br>';

I really like PHP Simple HTML DOM Parser for things like this. An example of grabbing images is right there on the front page:

// Create DOM from URL or file
$html = file_get_html('http://www.google.com/');

// Find all images
foreach($html->find('img') as $element)
       echo $element->src . '<br>';
虫児飞 2024-09-17 06:56:28

你可以用这个来废弃页面。

http://simplehtmldom.sourceforge.net/

但它需要 PHP 5+。

You can you this to scrap pages.

http://simplehtmldom.sourceforge.net/

but it requires PHP 5+.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文