是否有一个 python 模块可以在网络上抓取图像、标题和任何链接的描述？

发布于 2024-11-19 00:18:22 字数 90 浏览 1 评论 0原文

我正在寻找什么，应该给我这样的东西 -> 在此处输入图像描述

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

不爱素颜 2024-11-26 00:18:22

有许多可用的 API 可以完成您的任务（更准确地说是您在问题中描述的任务，而不是图像:)）。我个人使用 diffbot，这是我在阅读这个。但请注意，由于网页的性质，这种“内容”提取并不总是成功。相反，它依赖于启发法和培训，因此可能不足以满足您的特定目的......

回复收藏 0 原文

予囚 2024-11-26 00:18:22

如果您想要页面的完整屏幕截图，则类似于 https://stackoverflow.com/questions/1041371/alexa- api 可以帮助你吗？

否则，如果您只想从页面中获取一些关键图像..

您可以使用 mechanize 来帮助您。当您连接到网页时，您可以使用以下方式搜索页面上的所有链接：

for link in br.links():

其中 br 是您的浏览器对象。

您可以在此处查看示例：
下载所有链接（相关文档）在使用 Python 的网页上，

如果您打印 dir(link)，它将显示各种属性，例如 link.text 和 link.url。此外，您可以导入 urlparse.urlsplit 并在 url 上使用它。您可以将浏览器定向到 URL 并抓取图像，如上例所示。

If you're wanting an entire screenshot of the page then something like https://stackoverflow.com/questions/1041371/alexa-api may help you?

Otherwise if you're just wanting to get a few key images from the page..

you could use mechanize to assit you. When you connect to a webpage you can search through all the links on the page using:

for link in br.links():

where br is your browser object.

You can see an example here:
Download all the links(related documents) on a webpage using Python

if you print dir(link) it will show you various properties such as link.text and link.url. furthermore you can import urlparse.urlsplit and use it on the url. You can direct the browser towards the URL and scrape the images as shown in the above example.

回复收藏 0 原文