如何以编程方式获取此页面上的图像?
URL http://www.fourmilab.ch/cgi-bin/Earth 显示实时地球地图。
如果我在浏览器 (FF) 中发出此 URL,图像就会正常显示。但是当我尝试“wget”获取同一页面时,我失败了!
这是我首先尝试的:
wget -p http://www.fourmilab.ch/cgi-bin/Earth
考虑到可能所有其他表单字段也是必需的,我在上面的页面上做了“查看源代码”,记下各个字段值,然后发出以下 URL:
wget --post-data "opt=-p&lat=7°27'&lon=50°49'&ns=North&ew=East&alt=150889769&img=learth.evif&date=1&imgsize=320&daynight=-d" http://www.fourmilab.ch/cgi-bin/Earth
仍然没有图像!
有人可以告诉我这里发生了什么事吗...?基于 CGI 和/或表单 POST 的 wget 是否存在任何“陷阱”?这些概念将在哪里(书籍或在线资源)得到解释?
The URL http://www.fourmilab.ch/cgi-bin/Earth shows a live map of the Earth.
If I issue this URL in my browser (FF), the image shows up just fine. But when I try 'wget' to fetch the same page, I fail!
Here's what I tried first:
wget -p http://www.fourmilab.ch/cgi-bin/Earth
Thinking, that probably all other form fields are required too, I did a 'View Source' on the above page, noted down the various field values, and then issued the following URL:
wget --post-data "opt=-p&lat=7°27'&lon=50°49'&ns=North&ew=East&alt=150889769&img=learth.evif&date=1&imgsize=320&daynight=-d" http://www.fourmilab.ch/cgi-bin/Earth
Still no image!
Can someone please tell me what is going on here...? Are there any 'gotchas' with CGI and/or form-POST based wgets? Where (book or online resource) would such concepts be explained?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
如果您检查该页面的源代码,会发现里面有一个带有 img 的链接,其中包含地球的图像。例如:
如果不给出“di”参数,您只是要求整个网页,以及对此图像的引用,而不是图像本身。
编辑:“Di”参数编码您想要接收地球的哪个“部分”,无论如何,尝试例如
If you will inspect the page's source code, there's a link with img inside, that contains the image of earth. For example:
Without giving the 'di' parameter, you are just asking for whole web page, with references to this image, not for the image itself.
Edit: 'Di' parameter encodes which "part" of the earth you want to receive, anyway, try for example
使用 GET 而不是 POST。它们对于后台的CGI程序来说是完全不同的。
Use GET instead of POST. They're completely different for the CGI program in the background.
从 Ravadre 开始,
下载一个包含 的 XHTML 文件。标签。
我编辑了 XHTML 以删除除 img 标签之外的所有内容,并将其转换为包含另一个 wget -p 命令的 bash 脚本,转义 ? and =
当我执行此操作时,我得到了一个 14kB 文件,我将其重命名为 Earth.jpg
并不是真正的程序化,我这样做的方式,但我认为它可以完成。
但正如 @somedeveloper 所说, di 值正在变化(因为它取决于时间)。
Following on from Ravadre,
downloads an XHTML file which contain an <img> tag.
I edited the XHTML to remove everything but the img tag and turned it into a bash script containing another wget -p command, escaping the ? and =
When I executed this I got a 14kB file which I renamed earth.jpg
Not really programmatic, the way I did it, but I think it could be done.
But as @somedeveloper said, the di value is changing (since it depends on time).
伙计们,这就是我最终所做的。对这个解决方案不太满意,因为我过去(现在仍然)希望有一种更好的方法...在第一个 wget 本身上获取图像...给我与通过 firefox 浏览时获得的相同的用户体验。
Guys, here's what I finally did. Not fully happy with this solution, as I was (and am still) hoping for a better way... one that gets the image on the first wget itself... giving me the same user experience I get when browsing via firefox.
您下载的是整个 HTML 页面,而不是图像。要下载图像和其他元素,您需要使用
--page-requirements
(可能还有--convert-links
)参数。不幸的是,由于 robots.txt 不允许访问/cgi-bin/
下的 URL,wget 将不会下载位于/cgi-bin/
下的图像代码>.据我所知,没有参数可以禁用机器人协议。What you are downloading is the whole HTML page and not the image. To download the image and other elements too, you'll need to use the
--page-requisites
(and possibly--convert-links
) parameter(s). Unfortunately because robots.txt disallows access to URLs under/cgi-bin/
, wget will not download the image which is located under/cgi-bin/
. AFAIK there's no parameter to disable the robots protocol.