当前位置：文江博客话题详情

如何扫描网页并获取图像和 YouTube 嵌入？

发布于 2024-07-08 17:07:53 字数 202 浏览 8 评论 0原文

我正在构建一个网络应用程序，我需要获取给定 URL 上嵌入的所有图像和任何 Flash 视频（例如 youtube）。我正在使用Python。

我已经用谷歌搜索过，但没有找到任何关于此的好信息（可能是因为我不知道这叫什么来搜索），有没有人有这方面的经验并且知道如何做到这一点？

我很想看到一些代码示例（如果有的话）。

谢谢！

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

花落人断肠 2024-07-15 17:07:53

BeautifulSoup 是一个很棒的屏幕抓取库。使用 urllib2 获取页面，并使用 BeautifulSoup 对其进行解析。这是他们文档中的代码示例：

import urllib2
from BeautifulSoup import BeautifulSoup

page = urllib2.urlopen("http://www.icc-ccs.org/prc/piracyreport.php")
soup = BeautifulSoup(page)
for incident in soup('td', width="90%"):
    where, linebreak, what = incident.contents[:3]
    print where.strip()
    print what.strip()
    print

BeautifulSoup is a great screen-scraping library. Use urllib2 to fetch the page, and BeautifulSoup to parse it apart. Here's a code sample from their docs:

import urllib2
from BeautifulSoup import BeautifulSoup

page = urllib2.urlopen("http://www.icc-ccs.org/prc/piracyreport.php")
soup = BeautifulSoup(page)
for incident in soup('td', width="90%"):
    where, linebreak, what = incident.contents[:3]
    print where.strip()
    print what.strip()
    print

回复收藏 0 原文

~没有更多了~