我无法使用 Beautiful Soup 进行解析

发布于 2024-10-02 09:59:33 字数 688 浏览 0 评论 0原文

<td>
<a name="corner"></a>
<div>
<div style="aaaaa">
<div class="class-a">My name is alis</div>
</div>
<div>
<span><span class="class-b " title="My title"><span>Very Good</span></span> </span>
<b>My Description</b><br />
          My Name is Alis I am a python learner...
        </div>
<div class="class-3" style="style-2 clear: both;">
          alis
        </div>
</div>
<br /></td>

我想要抓取后的描述:

My Name is Alis I am a python learner...

我尝试了很多东西,但我找不到最好的方法。你们能给出这个问题的一般解决方案吗?

<td>
<a name="corner"></a>
<div>
<div style="aaaaa">
<div class="class-a">My name is alis</div>
</div>
<div>
<span><span class="class-b " title="My title"><span>Very Good</span></span> </span>
<b>My Description</b><br />
          My Name is Alis I am a python learner...
        </div>
<div class="class-3" style="style-2 clear: both;">
          alis
        </div>
</div>
<br /></td>

I want the description after scraping it:

My Name is Alis I am a python learner...

I tried a lots of thing but i could not figure it out the best way. Can you guys give the in general solution for this.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

穿透光 2024-10-09 09:59:33
from BeautifulSoup import BeautifulSoup
soup = BeautifulSoup("Your sample html here")
soup.td.div('div')[2].contents[-1]

这将返回您正在查找的字符串(应该注意的是带有任何适用空格的 unicode 字符串)。

这是通过解析 html、抓取第一个 td 标签及其内容、抓取第一个 div 标签内的任何 div 标签、选择列表中的第三个项目(列表索引 2)并抓取其最后一个内容来实现的。

在 BeautifulSoup 中,有很多方法可以做到这一点,所以这个答案可能并没有教会你太多,我真诚地建议你阅读 David 建议的教程。

from BeautifulSoup import BeautifulSoup
soup = BeautifulSoup("Your sample html here")
soup.td.div('div')[2].contents[-1]

This will return the string you are looking for (the unicode string, with any applicable whitespace, it should be noted).

This works by parsing the html, grabbing the first td tag and its contents, grabbing any div tags within the first div tag, selecting the 3rd item in the list (list index 2), and grabbing the last of its contents.

In BeautifulSoup, there are A LOT of ways to do this, so this answer probably hasn't taught you much and I genuinely recommend you read the tutorial that David suggested.

允世 2024-10-09 09:59:33

您是否尝试过阅读文档中提供的示例?他们的快速入门位于此处 http://www.crummy.com/software/BeautifulSoup /documentation.html#Quick 开始

编辑:
寻找

您可以通过以下方式加载您的 html

 from BeautifulSoup import BeautifulSoup
 soup = BeautifulSoup("My html here")
 myDiv = soup.find("div", { "class" : "class-a" })

另请记住,您可以通过 python 控制台完成大部分工作,然后使用 dir() 和 help() 逐步完成您想要执行的操作。尝试 ipython 或 python IDLE 可能会让你的生活更轻松,它们为初学者提供了非常友好的控制台。

Have you tried reading the examples provided in the documentation? They quick start is located here http://www.crummy.com/software/BeautifulSoup/documentation.html#Quick Start

Edit:
To find

You would load your html up via

 from BeautifulSoup import BeautifulSoup
 soup = BeautifulSoup("My html here")
 myDiv = soup.find("div", { "class" : "class-a" })

Also remember you can do most of this via the python console and then using dir() along with help() walk through what you're trying to do. It might make life easier on you to try out ipython or perhaps python IDLE which have very friendly consoles for beginners.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文