如何以编程方式获取此页面上的图像？

发布于 2024-08-03 06:57:38 字数 684 浏览 5 评论 0原文

URL http://www.fourmilab.ch/cgi-bin/Earth 显示实时地球地图。

如果我在浏览器 (FF) 中发出此 URL，图像就会正常显示。但是当我尝试“wget”获取同一页面时，我失败了！

这是我首先尝试的：

wget -p http://www.fourmilab.ch/cgi-bin/Earth

考虑到可能所有其他表单字段也是必需的，我在上面的页面上做了“查看源代码”，记下各个字段值，然后发出以下 URL：

wget --post-data "opt=-p&lat=7°27'&lon=50°49'&ns=North&ew=East&alt=150889769&img=learth.evif&date=1&imgsize=320&daynight=-d" http://www.fourmilab.ch/cgi-bin/Earth

仍然没有图像！

有人可以告诉我这里发生了什么事吗...？基于 CGI 和/或表单 POST 的 wget 是否存在任何“陷阱”？这些概念将在哪里（书籍或在线资源）得到解释？

原文

The URL http://www.fourmilab.ch/cgi-bin/Earth shows a live map of the Earth.

If I issue this URL in my browser (FF), the image shows up just fine. But when I try 'wget' to fetch the same page, I fail!

Here's what I tried first:

wget -p http://www.fourmilab.ch/cgi-bin/Earth

Thinking, that probably all other form fields are required too, I did a 'View Source' on the above page, noted down the various field values, and then issued the following URL:

wget --post-data "opt=-p&lat=7°27'&lon=50°49'&ns=North&ew=East&alt=150889769&img=learth.evif&date=1&imgsize=320&daynight=-d" http://www.fourmilab.ch/cgi-bin/Earth

Still no image!

Can someone please tell me what is going on here...? Are there any 'gotchas' with CGI and/or form-POST based wgets? Where (book or online resource) would such concepts be explained?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

说不完的你爱 2024-08-10 06:57:38

如果您检查该页面的源代码，会发现里面有一个带有 img 的链接，其中包含地球的图像。例如：

<img 
 src="/cgi-bin/Earth?di=570C6ABB1F33F13E95631EFF088262D5E20F2A10190A5A599229" 
 ismap="ismap" usemap="#zoommap" width="320" height="320" border="0" alt="" />

如果不给出“di”参数，您只是要求整个网页，以及对此图像的引用，而不是图像本身。

编辑：“Di”参数编码您想要接收地球的哪个“部分”，无论如何，尝试例如

wget http://www.fourmilab.ch/cgi-bin/Earth?di=F5AEC312B69A58973CCAB756A12BCB7C47A9BE99E3DDC5F63DF746B66C122E4E4B28ADC1EFADCC43752B45ABE2585A62E6FB304ACB6354E2796D9D3CEF7A1044FA32907855BA5C8F

If you will inspect the page's source code, there's a link with img inside, that contains the image of earth. For example:

<img 
 src="/cgi-bin/Earth?di=570C6ABB1F33F13E95631EFF088262D5E20F2A10190A5A599229" 
 ismap="ismap" usemap="#zoommap" width="320" height="320" border="0" alt="" />

Without giving the 'di' parameter, you are just asking for whole web page, with references to this image, not for the image itself.

Edit: 'Di' parameter encodes which "part" of the earth you want to receive, anyway, try for example

wget http://www.fourmilab.ch/cgi-bin/Earth?di=F5AEC312B69A58973CCAB756A12BCB7C47A9BE99E3DDC5F63DF746B66C122E4E4B28ADC1EFADCC43752B45ABE2585A62E6FB304ACB6354E2796D9D3CEF7A1044FA32907855BA5C8F

回复收藏 0 原文

ら栖息 2024-08-10 06:57:38

使用 GET 而不是 POST。它们对于后台的CGI程序来说是完全不同的。

回复收藏 0 原文

南渊 2024-08-10 06:57:38

从 Ravadre 开始，

wget -p http://www.fourmilab.ch/cgi-bin/Earth

下载一个包含的 XHTML 文件。标签。

我编辑了 XHTML 以删除除 img 标签之外的所有内容，并将其转换为包含另一个 wget -p 命令的 bash 脚本，转义 ? and =

当我执行此操作时，我得到了一个 14kB 文件，我将其重命名为 Earth.jpg

并不是真正的程序化，我这样做的方式，但我认为它可以完成。

但正如 @somedeveloper 所说， di 值正在变化（因为它取决于时间）。

Following on from Ravadre,

wget -p http://www.fourmilab.ch/cgi-bin/Earth

downloads an XHTML file which contain an <img> tag.

I edited the XHTML to remove everything but the img tag and turned it into a bash script containing another wget -p command, escaping the ? and =

When I executed this I got a 14kB file which I renamed earth.jpg

Not really programmatic, the way I did it, but I think it could be done.

But as @somedeveloper said, the di value is changing (since it depends on time).

回复收藏 0 原文

谁与争疯 2024-08-10 06:57:38

伙计们，这就是我最终所做的。对这个解决方案不太满意，因为我过去（现在仍然）希望有一种更好的方法...在第一个 wget 本身上获取图像...给我与通过 firefox 浏览时获得的相同的用户体验。

#!/bin/bash

tmpf=/tmp/delme.jpeg
base=http://www.fourmilab.ch
liveurl=$(wget -O - $base/cgi-bin/Earth?opt=-p 2>/dev/null | perl -0777 -nle 'if(m@<img \s+ src \s* = \s* "(/cgi-bin/Earth\?di= .*? )" @gsix) { print "$1\n" }' )
wget -O $tmpf $base/$liveurl &>/dev/null

Guys, here's what I finally did. Not fully happy with this solution, as I was (and am still) hoping for a better way... one that gets the image on the first wget itself... giving me the same user experience I get when browsing via firefox.

#!/bin/bash

tmpf=/tmp/delme.jpeg
base=http://www.fourmilab.ch
liveurl=$(wget -O - $base/cgi-bin/Earth?opt=-p 2>/dev/null | perl -0777 -nle 'if(m@<img \s+ src \s* = \s* "(/cgi-bin/Earth\?di= .*? )" @gsix) { print "$1\n" }' )
wget -O $tmpf $base/$liveurl &>/dev/null

回复收藏 0 原文

鹿港巷口少年归 2024-08-10 06:57:38

您下载的是整个 HTML 页面，而不是图像。要下载图像和其他元素，您需要使用 --page-requirements （可能还有 --convert-links）参数。不幸的是，由于 robots.txt 不允许访问 /cgi-bin/ 下的 URL，wget 将不会下载位于 /cgi-bin/ 下的图像代码>.据我所知，没有参数可以禁用机器人协议。