Python - BeautifulSoup - HTML 解析
这是站点代码的片段
<td class='vcard' id='results100212571'>
<h2 class="custom_seeMore">
<a class="fn openPreview" href="link.html">Hotel Name<span class="seeMore">See More...</span></a>
</h2>
<div class='clearer'></div>
<div class='adr'>
<span class='postal-code'>00000</span>
<span class='locality'>City</span>
<span class='street-address'>Address</span>
</div>
<p class="tel">Phone number</p>
,我尝试解析它,
for element in BeautifulSoup(page).findAll('td'):
if element.find('a', {'class' : 'fn openPreview'}):
print element.find('a', {'class' : 'fn openPreview'}).string
if element.find('span', {'class' : 'postal-code'}):
print element.find('span', {'class' : 'postal-code'}).string
if element.find('span', {'class' : 'locality'}):
print element.find('span', {'class' : 'locality'}).string
if element.find('span', {'class' : 'street-address'}):
print element.find('span', {'class' : 'street-address'}).string
if element.find('p', {'class' : 'tel'}):
print element.find('p', {'class' : 'tel'}).string
我知道这是非常业余的代码,但它几乎可以工作。即它适用于除“fn openPreview”之外的所有类,所有其他类都绘制其内容,但
print element.find('a', {'class' : 'fn openPreview'}).string
打印 None
请帮助我,如何解析它。
Here is fragment of the site code
<td class='vcard' id='results100212571'>
<h2 class="custom_seeMore">
<a class="fn openPreview" href="link.html">Hotel Name<span class="seeMore">See More...</span></a>
</h2>
<div class='clearer'></div>
<div class='adr'>
<span class='postal-code'>00000</span>
<span class='locality'>City</span>
<span class='street-address'>Address</span>
</div>
<p class="tel">Phone number</p>
and I try to parse it
for element in BeautifulSoup(page).findAll('td'):
if element.find('a', {'class' : 'fn openPreview'}):
print element.find('a', {'class' : 'fn openPreview'}).string
if element.find('span', {'class' : 'postal-code'}):
print element.find('span', {'class' : 'postal-code'}).string
if element.find('span', {'class' : 'locality'}):
print element.find('span', {'class' : 'locality'}).string
if element.find('span', {'class' : 'street-address'}):
print element.find('span', {'class' : 'street-address'}).string
if element.find('p', {'class' : 'tel'}):
print element.find('p', {'class' : 'tel'}).string
I know it's very amateur code, but it almost works. ie it works for all classes except 'fn openPreview', all other classes draw their content, but
print element.find('a', {'class' : 'fn openPreview'}).string
print None
Please help me, how to parse it.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
根据 BeautifulSoup 文档,
element.string
如果element
有多个子元素,则为None
。在您的情况下,
将打印“酒店名称”。
According to the BeautifulSoup documentation,
element.string
will beNone
ifelement
has multiple children.In your case,
will print "Hotel Name".