使用 Python 和 Beautiful Soup 解析 HTML

发布于 2024-11-18 01:54:48 字数 909 浏览 2 评论 0原文

<div class="profile-row clearfix"><div class="profile-row-header">Member Since</div><div class="profile-information">January 2010</div></div>
<div class="profile-row clearfix"><div class="profile-row-header">AIGA Chapter</div><div class="profile-information">Alaska</div></div>
<div class="profile-row clearfix"><div class="profile-row-header">Title</div><div class="profile-information">Owner</div></div>
<div class="profile-row clearfix"><div class="profile-row-header">Company</div><div class="profile-information">Mad Dog Graphx</div></div>

我正在使用 Beautiful Soup 在 HTML 代码中达到这一点。我现在想要搜索代码,并提取 2010 年 1 月、阿拉斯加、Owner 和 Mad Dog Graph 等数据。所有这些数据都具有相同的类,但它们事先具有不同的变量,例如“成员自”、“AIGA 章节”等。我怎样才能搜索“会员自”,然后获得“2010 年 1 月”。并对其他 3 个字段执行相同的操作?

<div class="profile-row clearfix"><div class="profile-row-header">Member Since</div><div class="profile-information">January 2010</div></div>
<div class="profile-row clearfix"><div class="profile-row-header">AIGA Chapter</div><div class="profile-information">Alaska</div></div>
<div class="profile-row clearfix"><div class="profile-row-header">Title</div><div class="profile-information">Owner</div></div>
<div class="profile-row clearfix"><div class="profile-row-header">Company</div><div class="profile-information">Mad Dog Graphx</div></div>

I'm using Beautiful Soup to get to this point in HTML code. I now want to search through the code, and pull the data like January 2010, Alaska, Owner, and Mad Dog Graph. All this data has the same class but they have different variables like "Member Since", "AIGA Chapter," etc. before hand. How can I search for Member Since and then get January 2010. And do the same for the other 3 fields?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

国际总奸 2024-11-25 01:54:48
>>> from BeautifulSoup import BeautifulSoup
>>> soup = BeautifulSoup('''<div class="profile-row clearfix"><div class="profile-row-header">Member Since</div><div class="profile-information">January 2010</div></div>
... <div class="profile-row clearfix"><div class="profile-row-header">AIGA Chapter</div><div class="profile-information">Alaska</div></div>
... <div class="profile-row clearfix"><div class="profile-row-header">Title</div><div class="profile-information">Owner</div></div>
... <div class="profile-row clearfix"><div class="profile-row-header">Company</div><div class="profile-information">Mad Dog Graphx</div></div>
... ''')
>>> for row in soup.findAll('div', {'class':'profile-row clearfix'}):
...  field, value = row.findAll(text = True)
...  print field, value
... 
Member Since January 2010
AIGA Chapter Alaska
Title Owner
Company Mad Dog Graphx

当然,您可以使用 fieldvalue 执行任何操作,例如使用它们创建字典或将它们存储在数据库中。

如果“profile-row clearfix”div 中还有其他 div 或其他文本节点,则需要执行类似 field = row.find('div', {'class':'profile-row- header'}).findAll(text=True)

>>> from BeautifulSoup import BeautifulSoup
>>> soup = BeautifulSoup('''<div class="profile-row clearfix"><div class="profile-row-header">Member Since</div><div class="profile-information">January 2010</div></div>
... <div class="profile-row clearfix"><div class="profile-row-header">AIGA Chapter</div><div class="profile-information">Alaska</div></div>
... <div class="profile-row clearfix"><div class="profile-row-header">Title</div><div class="profile-information">Owner</div></div>
... <div class="profile-row clearfix"><div class="profile-row-header">Company</div><div class="profile-information">Mad Dog Graphx</div></div>
... ''')
>>> for row in soup.findAll('div', {'class':'profile-row clearfix'}):
...  field, value = row.findAll(text = True)
...  print field, value
... 
Member Since January 2010
AIGA Chapter Alaska
Title Owner
Company Mad Dog Graphx

You can of course do anything you want with field and value, like create a dict with them or store them in a database.

If there are other divs or other text nodes within the "profile-row clearfix" div, you'll need to do something like field = row.find('div', {'class':'profile-row-header'}).findAll(text=True), etc.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文