如何在 python 中检索网页,包括任何图像
我正在尝试检索网页的来源,包括所有图像。目前我有这个:
import urllib
page = urllib.urlretrieve('http://127.0.0.1/myurl.php', 'urlgot.php')
print urlgot.php
它可以很好地检索源代码,但我还需要下载任何链接的图像。
我想我可以创建一个正则表达式来搜索 img src 或下载源中的类似内容;但是,我想知道是否有 urllib 函数也可以检索图像?类似于 wget 命令:
wget -r --no-parent http://127.0.0.1/myurl.php
我不想使用 os 模块并运行 wget,因为我希望脚本在所有系统上运行。因此,我也无法使用任何第三方模块。
非常感谢任何帮助!谢谢
I'm trying to retrieve the source of a webpage, including any images. At the moment I have this:
import urllib
page = urllib.urlretrieve('http://127.0.0.1/myurl.php', 'urlgot.php')
print urlgot.php
which retrieves the source fine, but I also need to download any linked images.
I was thinking I could create a regular expression which searched for img src or similar in the downloaded source; however, I was wondering if there was urllib function that would retrieve the images as well? Similar to the wget command of:
wget -r --no-parent http://127.0.0.1/myurl.php
I don't want to use the os module and run the wget, as I want the script to run on all systems. For this reason I can't use any third party modules either.
Any help is much appreciated! Thanks
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
当 Python 内置了一个非常好的解析器时,不要使用正则表达式:
Don't use regex when there is a perfectly good parser built in to Python:
使用 BeautifulSoup 解析返回的 HTML 并搜索图像链接。您可能还需要递归地获取框架和 iframe。
Use BeautifulSoup to parse the returned HTML and search for image links. You might also need to recursively fetch frames and iframes.