使用 Python 和 Beautiful Soup 解析 HTML
<div class="profile-row clearfix"><div class="profile-row-header">Member Since</div><div class="profile-information">January 2010</div></div>
<div class="profile-row clearfix"><div class="profile-row-header">AIGA Chapter</div><div class="profile-information">Alaska</div></div>
<div class="profile-row clearfix"><div class="profile-row-header">Title</div><div class="profile-information">Owner</div></div>
<div class="profile-row clearfix"><div class="profile-row-header">Company</div><div class="profile-information">Mad Dog Graphx</div></div>
我正在使用 Beautiful Soup 在 HTML 代码中达到这一点。我现在想要搜索代码,并提取 2010 年 1 月、阿拉斯加、Owner 和 Mad Dog Graph 等数据。所有这些数据都具有相同的类,但它们事先具有不同的变量,例如“成员自”、“AIGA 章节”等。我怎样才能搜索“会员自”,然后获得“2010 年 1 月”。并对其他 3 个字段执行相同的操作?
<div class="profile-row clearfix"><div class="profile-row-header">Member Since</div><div class="profile-information">January 2010</div></div>
<div class="profile-row clearfix"><div class="profile-row-header">AIGA Chapter</div><div class="profile-information">Alaska</div></div>
<div class="profile-row clearfix"><div class="profile-row-header">Title</div><div class="profile-information">Owner</div></div>
<div class="profile-row clearfix"><div class="profile-row-header">Company</div><div class="profile-information">Mad Dog Graphx</div></div>
I'm using Beautiful Soup to get to this point in HTML code. I now want to search through the code, and pull the data like January 2010, Alaska, Owner, and Mad Dog Graph. All this data has the same class but they have different variables like "Member Since", "AIGA Chapter," etc. before hand. How can I search for Member Since and then get January 2010. And do the same for the other 3 fields?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
当然,您可以使用
field
和value
执行任何操作,例如使用它们创建字典或将它们存储在数据库中。如果“profile-row clearfix”div 中还有其他 div 或其他文本节点,则需要执行类似
field = row.find('div', {'class':'profile-row- header'}).findAll(text=True)
等You can of course do anything you want with
field
andvalue
, like create a dict with them or store them in a database.If there are other divs or other text nodes within the "profile-row clearfix" div, you'll need to do something like
field = row.find('div', {'class':'profile-row-header'}).findAll(text=True)
, etc.