使用 BeautifulSoup 提取标签内的内容
我想提取内容Hello world
。请注意,页面上还有多个 和类似的
:
<table border="0" cellspacing="2" width="800">
<tr>
<td colspan="2"><b>Name: </b>Hello world</td>
</tr>
<tr>
...
我尝试了以下操作:
hello = soup.find(text='Name: ')
hello.findPreviousSiblings
但它返回了没有什么。
此外,我在提取我的家庭地址
时也遇到问题:
<td><b>Address:</b></td>
<td>My home address</td>
我也在使用相同的方法搜索text="Address: "
但如何向下导航到下一行并提取 的内容?
I'd like to extract the content Hello world
. Please note that there are multiples <table>
and similar <td colspan="2">
on the page as well:
<table border="0" cellspacing="2" width="800">
<tr>
<td colspan="2"><b>Name: </b>Hello world</td>
</tr>
<tr>
...
I tried the following:
hello = soup.find(text='Name: ')
hello.findPreviousSiblings
But it returned nothing.
In addition, I'm also having problem with the following extracting the My home address
:
<td><b>Address:</b></td>
<td>My home address</td>
I'm also using the same method to search for the text="Address: "
but how do I navigate down to the next line and extract the content of <td>
?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
contents
运算符非常适合从text
中提取text
。我的家庭地址
示例:地址:
示例:The
contents
operator works well for extractingtext
from<tag>text</tag>
.<td>My home address</td>
example:<td><b>Address:</b></td>
example:使用
.next
代替:.next
和.previous
允许您按照解析器处理文档元素的顺序移动文档元素,而同级方法与解析树一起使用。Use
.next
instead:.next
and.previous
lets you move through the document elements in the order they were processed by the parser, while sibling methods work with the parse tree.使用下面的代码使用 python beautifulSoup 从 html 标签中提取文本和内容
Use the below code to get extract text and content from html tags with python beautifulSoup