从美丽的套件中提取无ID的跨度文本
有人知道如何使用BeautifulSoup
在 p 标记中从每个 span 中提取文本?我试图在Python中弄清楚这一点。我正在使用Craigslist汽车上市。
到目前为止,这就是我能够完成的:
#retrieve post spans
spans = soup.find_all(class_='attrgroup')
print(spans[1].prettify())
理想情况下,我正在尝试创建一本词典。 示例:
dict = {
"condition": "good",
"cylinders": "8 cylinders",
"drive": 4wd,
etc.
}
ouput
<p class="attrgroup"><span>condition:<b>good</b></span><br/><span>cylinders:<b>8 cylinders</b></span><br/><span>drive:<b>4wd</b></span><br/><span>fuel:<b>gas</b></span><br/><span>odometer:<b>138000</b></span><br/><span>paint color:<b>blue</b></span><br/><span>size:<b>full-size</b></span><br/><span>title status:<b>clean</b></span><br/><span>transmission:<b>automatic</b></span><br/><span>type:<b>pickup</b></span><br/></p>
Does anyone know how to extract the text from each
spanin a
ptag using beautifulsoup
? I'm trying to figure this out in python. I'm using a craigslist car listing.
This is what I was able to accomplish so far:
#retrieve post spans
spans = soup.find_all(class_='attrgroup')
print(spans[1].prettify())
Ideally, I'm trying to create a dictionary.
Example:
dict = {
"condition": "good",
"cylinders": "8 cylinders",
"drive": 4wd,
etc.
}
OUPUT
<p class="attrgroup"><span>condition:<b>good</b></span><br/><span>cylinders:<b>8 cylinders</b></span><br/><span>drive:<b>4wd</b></span><br/><span>fuel:<b>gas</b></span><br/><span>odometer:<b>138000</b></span><br/><span>paint color:<b>blue</b></span><br/><span>size:<b>full-size</b></span><br/><span>title status:<b>clean</b></span><br/><span>transmission:<b>automatic</b></span><br/><span>type:<b>pickup</b></span><br/></p>
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
尝试以下操作:
输出:
Try this:
Output:
您可以使用
stripped_strings
,以防模式始终是相同的示例
输出
You could use
stripped_strings
in case pattern is always the sameExample
Output