我无法使用 Beautiful Soup 进行解析
<td>
<a name="corner"></a>
<div>
<div style="aaaaa">
<div class="class-a">My name is alis</div>
</div>
<div>
<span><span class="class-b " title="My title"><span>Very Good</span></span> </span>
<b>My Description</b><br />
My Name is Alis I am a python learner...
</div>
<div class="class-3" style="style-2 clear: both;">
alis
</div>
</div>
<br /></td>
我想要抓取后的描述:
My Name is Alis I am a python learner...
我尝试了很多东西,但我找不到最好的方法。你们能给出这个问题的一般解决方案吗?
<td>
<a name="corner"></a>
<div>
<div style="aaaaa">
<div class="class-a">My name is alis</div>
</div>
<div>
<span><span class="class-b " title="My title"><span>Very Good</span></span> </span>
<b>My Description</b><br />
My Name is Alis I am a python learner...
</div>
<div class="class-3" style="style-2 clear: both;">
alis
</div>
</div>
<br /></td>
I want the description after scraping it:
My Name is Alis I am a python learner...
I tried a lots of thing but i could not figure it out the best way. Can you guys give the in general solution for this.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
这将返回您正在查找的字符串(应该注意的是带有任何适用空格的 unicode 字符串)。
这是通过解析 html、抓取第一个 td 标签及其内容、抓取第一个 div 标签内的任何 div 标签、选择列表中的第三个项目(列表索引 2)并抓取其最后一个内容来实现的。
在 BeautifulSoup 中,有很多方法可以做到这一点,所以这个答案可能并没有教会你太多,我真诚地建议你阅读 David 建议的教程。
This will return the string you are looking for (the unicode string, with any applicable whitespace, it should be noted).
This works by parsing the html, grabbing the first td tag and its contents, grabbing any div tags within the first div tag, selecting the 3rd item in the list (list index 2), and grabbing the last of its contents.
In BeautifulSoup, there are A LOT of ways to do this, so this answer probably hasn't taught you much and I genuinely recommend you read the tutorial that David suggested.
您是否尝试过阅读文档中提供的示例?他们的快速入门位于此处 http://www.crummy.com/software/BeautifulSoup /documentation.html#Quick 开始
编辑:
寻找
您可以通过以下方式加载您的 html
另请记住,您可以通过 python 控制台完成大部分工作,然后使用 dir() 和 help() 逐步完成您想要执行的操作。尝试 ipython 或 python IDLE 可能会让你的生活更轻松,它们为初学者提供了非常友好的控制台。
Have you tried reading the examples provided in the documentation? They quick start is located here http://www.crummy.com/software/BeautifulSoup/documentation.html#Quick Start
Edit:
To find
You would load your html up via
Also remember you can do most of this via the python console and then using dir() along with help() walk through what you're trying to do. It might make life easier on you to try out ipython or perhaps python IDLE which have very friendly consoles for beginners.