如何以编程方式获取此页面上的图像?

发布于 2024-08-03 06:57:38 字数 684 浏览 5 评论 0原文

URL http://www.fourmilab.ch/cgi-bin/Earth 显示实时地球地图。

如果我在浏览器 (FF) 中发出此 URL,图像就会正常显示。但是当我尝试“wget”获取同一页面时,我失败了!

这是我首先尝试的:

wget -p http://www.fourmilab.ch/cgi-bin/Earth

考虑到可能所有其他表单字段也是必需的,我在上面的页面上做了“查看源代码”,记下各个字段值,然后发出以下 URL:

wget --post-data "opt=-p&lat=7°27'&lon=50°49'&ns=North&ew=East&alt=150889769&img=learth.evif&date=1&imgsize=320&daynight=-d" http://www.fourmilab.ch/cgi-bin/Earth

仍然没有图像!

有人可以告诉我这里发生了什么事吗...?基于 CGI 和/或表单 POST 的 wget 是否存在任何“陷阱”?这些概念将在哪里(书籍或在线资源)得到解释?

The URL http://www.fourmilab.ch/cgi-bin/Earth shows a live map of the Earth.

If I issue this URL in my browser (FF), the image shows up just fine. But when I try 'wget' to fetch the same page, I fail!

Here's what I tried first:

wget -p http://www.fourmilab.ch/cgi-bin/Earth

Thinking, that probably all other form fields are required too, I did a 'View Source' on the above page, noted down the various field values, and then issued the following URL:

wget --post-data "opt=-p&lat=7°27'&lon=50°49'&ns=North&ew=East&alt=150889769&img=learth.evif&date=1&imgsize=320&daynight=-d" http://www.fourmilab.ch/cgi-bin/Earth

Still no image!

Can someone please tell me what is going on here...? Are there any 'gotchas' with CGI and/or form-POST based wgets? Where (book or online resource) would such concepts be explained?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

说不完的你爱 2024-08-10 06:57:38

如果您检查该页面的源代码,会发现里面有一个带有 img 的链接,其中包含地球的图像。例如:

<img 
 src="/cgi-bin/Earth?di=570C6ABB1F33F13E95631EFF088262D5E20F2A10190A5A599229" 
 ismap="ismap" usemap="#zoommap" width="320" height="320" border="0" alt="" /> 

如果不给出“di”参数,您只是要求整个网页,以及对此图像的引用,而不是图像本身。

编辑:“Di”参数编码您想要接收地球的哪个“部分”,无论如何,尝试例如

wget http://www.fourmilab.ch/cgi-bin/Earth?di=F5AEC312B69A58973CCAB756A12BCB7C47A9BE99E3DDC5F63DF746B66C122E4E4B28ADC1EFADCC43752B45ABE2585A62E6FB304ACB6354E2796D9D3CEF7A1044FA32907855BA5C8F

If you will inspect the page's source code, there's a link with img inside, that contains the image of earth. For example:

<img 
 src="/cgi-bin/Earth?di=570C6ABB1F33F13E95631EFF088262D5E20F2A10190A5A599229" 
 ismap="ismap" usemap="#zoommap" width="320" height="320" border="0" alt="" /> 

Without giving the 'di' parameter, you are just asking for whole web page, with references to this image, not for the image itself.

Edit: 'Di' parameter encodes which "part" of the earth you want to receive, anyway, try for example

wget http://www.fourmilab.ch/cgi-bin/Earth?di=F5AEC312B69A58973CCAB756A12BCB7C47A9BE99E3DDC5F63DF746B66C122E4E4B28ADC1EFADCC43752B45ABE2585A62E6FB304ACB6354E2796D9D3CEF7A1044FA32907855BA5C8F

ら栖息 2024-08-10 06:57:38

使用 GET 而不是 POST。它们对于后台的CGI程序来说是完全不同的。

Use GET instead of POST. They're completely different for the CGI program in the background.

南渊 2024-08-10 06:57:38

从 Ravadre 开始,

wget -p http://www.fourmilab.ch/cgi-bin/Earth 

下载一个包含 的 XHTML 文件。标签。

我编辑了 XHTML 以删除除 img 标签之外的所有内容,并将其转换为包含另一个 wget -p 命令的 bash 脚本,转义 ? and =

当我执行此操作时,我得到了一个 14kB 文件,我将其重命名为 Earth.jpg

并不是真正的程序化,我这样做的方式,但我认为它可以完成。

但正如 @somedeveloper 所说, di 值正在变化(因为它取决于时间)。

Following on from Ravadre,

wget -p http://www.fourmilab.ch/cgi-bin/Earth 

downloads an XHTML file which contain an <img> tag.

I edited the XHTML to remove everything but the img tag and turned it into a bash script containing another wget -p command, escaping the ? and =

When I executed this I got a 14kB file which I renamed earth.jpg

Not really programmatic, the way I did it, but I think it could be done.

But as @somedeveloper said, the di value is changing (since it depends on time).

谁与争疯 2024-08-10 06:57:38

伙计们,这就是我最终所做的。对这个解决方案不太满意,因为我过去(现在仍然)希望有一种更好的方法...在第一个 wget 本身上获取图像...给我与通过 firefox 浏览时获得的相同的用户体验。

#!/bin/bash

tmpf=/tmp/delme.jpeg
base=http://www.fourmilab.ch
liveurl=$(wget -O - $base/cgi-bin/Earth?opt=-p 2>/dev/null | perl -0777 -nle 'if(m@<img \s+ src \s* = \s* "(/cgi-bin/Earth\?di= .*? )" @gsix) { print "$1\n" }' )
wget -O $tmpf $base/$liveurl &>/dev/null

Guys, here's what I finally did. Not fully happy with this solution, as I was (and am still) hoping for a better way... one that gets the image on the first wget itself... giving me the same user experience I get when browsing via firefox.

#!/bin/bash

tmpf=/tmp/delme.jpeg
base=http://www.fourmilab.ch
liveurl=$(wget -O - $base/cgi-bin/Earth?opt=-p 2>/dev/null | perl -0777 -nle 'if(m@<img \s+ src \s* = \s* "(/cgi-bin/Earth\?di= .*? )" @gsix) { print "$1\n" }' )
wget -O $tmpf $base/$liveurl &>/dev/null
鹿港巷口少年归 2024-08-10 06:57:38

您下载的是整个 HTML 页面,而不是图像。要下载图像和其他元素,您需要使用 --page-requirements (可能还有 --convert-links)参数。不幸的是,由于 robots.txt 不允许访问 /cgi-bin/ 下的 URL,wget 将不会下载位于 /cgi-bin/ 下的图像代码>.据我所知,没有参数可以禁用机器人协议。

What you are downloading is the whole HTML page and not the image. To download the image and other elements too, you'll need to use the --page-requisites (and possibly --convert-links) parameter(s). Unfortunately because robots.txt disallows access to URLs under /cgi-bin/, wget will not download the image which is located under /cgi-bin/. AFAIK there's no parameter to disable the robots protocol.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文