使用 PHP 抓取完整图像 src

发布于 2024-07-27 17:08:59 字数 248 浏览 8 评论 0原文

我正在尝试用 php 抓取 img src,我可以很好地获取 src,但是如果 src 不包含完整路径,那么我无法真正重用它。 有没有办法使用php获取图像的完整路径(如果使用右键菜单,浏览器可以获取它)。

IE。 如何获取包含以下两个示例之一中的域的完整路径?

src="../foo/logo.png"
src="/images/logo.png"

谢谢,

艾伦

I am trying to scrape img src's with php, I can get the src fine, but if the src does not include the full path then I can't really reuse it. Is there a way to grab the full path of the image using php (browsers can get it if you use the right click menu).

ie. How do I get a FULL path including the domain in one of the following two examples?

src="../foo/logo.png"
src="/images/logo.png"

Thanks,

Allan

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

雪若未夕 2024-08-03 17:08:59

您不需要正则表达式...只需要一些耐心。 我真的不想为你编写代码,只是检查 src 是否以 http:// 开头,如果不是,你就会有 3 种不同的情况。

  1. 如果它以 / 开头,则在前面添加 http://domain.com
  2. 如果它以.. 你必须拆分 完整的 URL 和 破解片段 直到 src 开始带有 /
  3. Else(以字母开头),获取完整域名,以及 将其删除到最后一个斜杠,然后附加 src URL。

或者....偷懒并

$url = "http://www.goat.com/money/dave.html";
$rel = "../images/cheese.jpg";

$com = InternetCombineURL($url,$rel);

//  Returns http://www.goat.com/images/cheese.jpg

function InternetCombineUrl($absolute, $relative) {
    $p = parse_url($relative);
    if($p["scheme"])return $relative;
    
    extract(parse_url($absolute));
    
    $path = dirname($path); 

    if($relative{0} == '/') {
        $cparts = array_filter(explode("/", $relative));
    }
    else {
        $aparts = array_filter(explode("/", $path));
        $rparts = array_filter(explode("/", $relative));
        $cparts = array_merge($aparts, $rparts);
        foreach($cparts as $i => $part) {
            if($part == '.') {
                $cparts[$i] = null;
            }
            if($part == '..') {
                $cparts[$i - 1] = null;
                $cparts[$i] = null;
            }
        }
        $cparts = array_filter($cparts);
    }
    $path = implode("/", $cparts);
    $url = "";
    if($scheme) {
        $url = "$scheme://";
    }
    if($user) {
        $url .= "$user";
        if($pass) {
            $url .= ":$pass";
        }
        $url .= "@";
    }
    if($host) {
        $url .= "$host/";
    }
    $url .= $path;
    return $url;
}

http://www.web- max.ca/PHP/misc_24.php

You don't need a regex... just some patience. I don't really want to write the code for you, but just check if the src starts with http://, and if not, you have like 3 different cases.

  1. If it begins with a / then prepend http://domain.com
  2. If it begins with .. you'll have to split the full URL and hack off pieces until the src starts with a /
  3. Else (it begins with a letter), the take the full domain, and strip it down to the last slash then append the src URL.

Or.... be lazy and steal this script

$url = "http://www.goat.com/money/dave.html";
$rel = "../images/cheese.jpg";

$com = InternetCombineURL($url,$rel);

//  Returns http://www.goat.com/images/cheese.jpg

function InternetCombineUrl($absolute, $relative) {
    $p = parse_url($relative);
    if($p["scheme"])return $relative;
    
    extract(parse_url($absolute));
    
    $path = dirname($path); 

    if($relative{0} == '/') {
        $cparts = array_filter(explode("/", $relative));
    }
    else {
        $aparts = array_filter(explode("/", $path));
        $rparts = array_filter(explode("/", $relative));
        $cparts = array_merge($aparts, $rparts);
        foreach($cparts as $i => $part) {
            if($part == '.') {
                $cparts[$i] = null;
            }
            if($part == '..') {
                $cparts[$i - 1] = null;
                $cparts[$i] = null;
            }
        }
        $cparts = array_filter($cparts);
    }
    $path = implode("/", $cparts);
    $url = "";
    if($scheme) {
        $url = "$scheme://";
    }
    if($user) {
        $url .= "$user";
        if($pass) {
            $url .= ":$pass";
        }
        $url .= "@";
    }
    if($host) {
        $url .= "$host/";
    }
    $url .= $path;
    return $url;
}

From http://www.web-max.ca/PHP/misc_24.php

野味少女 2024-08-03 17:08:59

除非您有开始的站点 URL(在这种情况下,您可以将其添加到 src 属性的值之前),否则您似乎只剩下一个字符串。

当然,我假设您无权访问任何其他信息。 如果您正在解析 HTML,我假设您必须能够访问至少 HTML 页面的绝对 URL,但也许不能。

Unless you have the site URL you're starting with (in which case you can prepend it to the value of the src attribute) it seems like all you're left with there is a string.

I'm assuming you don't have access to any additional information of course. If you're parsing HTML, I'd assume you must be able to access an absolute URL to at least the HTML page, but perhaps not.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文