帮助解析
 之间使用 BeautifulSoup 的标签

发布于 2024-11-18 02:32:22 字数 1216 浏览 0 评论 0原文

我尝试使用 BeautifulSoup 和 python 解析网站上的信息。该 html 如下所示。我希望我的解析数据看起来像:

ID 定义
赖氨酸生物合成 - 假鼻伯克霍尔德菌 17
...其余数据位于类似位置(在“pre”标签内和“a”标签外。

我该怎么做?

<pre>ID                   Definition
    ----------------------------------------------------------------------------------------------------
<a href="/kegg-bin/show_pathway?bpm00300">bpm00300</a>             Lysine biosynthesis - Burkholderia pseudomallei 17 
<a href="/kegg-bin/show_pathway?bpm00330">bpm00330</a>             Arginine and proline metabolism - Burkholderia pse 
<a href="/kegg-bin/show_pathway?bpm01100">bpm01100</a>             Metabolic pathways - Burkholderia pseudomallei 171 
<a href="/kegg-bin/show_pathway?bpm01110">bpm01110</a>             Biosynthesis of secondary metabolites - Burkholder 
</pre>

我尝试过:

y=soup.find('pre') #returns data between <pre> tags. Specific to KEGG
    for a in y:
        z =a.string

这给了我:

 ID                   Definition
----------------------------------------------------------------------------------------------------

感谢您的帮助!

I am attempint to parse out information from a website using BeautifulSoup and python. The html looks like the following. I am wanting my parsed data to look like:

ID Definition
Lysine.biosynthesis - Burkholderia psuedomallei 17
... rest of data in similar place (within the "pre" tags and outside the "a" tags.

How can I do this?

<pre>ID                   Definition
    ----------------------------------------------------------------------------------------------------
<a href="/kegg-bin/show_pathway?bpm00300">bpm00300</a>             Lysine biosynthesis - Burkholderia pseudomallei 17 
<a href="/kegg-bin/show_pathway?bpm00330">bpm00330</a>             Arginine and proline metabolism - Burkholderia pse 
<a href="/kegg-bin/show_pathway?bpm01100">bpm01100</a>             Metabolic pathways - Burkholderia pseudomallei 171 
<a href="/kegg-bin/show_pathway?bpm01110">bpm01110</a>             Biosynthesis of secondary metabolites - Burkholder 
</pre>

I have tried by:

y=soup.find('pre') #returns data between <pre> tags. Specific to KEGG
    for a in y:
        z =a.string

This gave me:

 ID                   Definition
----------------------------------------------------------------------------------------------------

Thanks for the help!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

圈圈圆圆圈圈 2024-11-25 02:32:22

BeautifulSoup() 及其搜索方法 返回一个分层解析树对象,不仅仅是一个字符串。在找到的节点上迭代 findChildren() 可以完成您想要的操作(并跳过标题行):

for a in soup.find('pre').findChildren():
    z = a.string

BeautifulSoup() and its search methods return you a hierarchical parse-tree object, not just a string. Iterating through findChildren() on the node found does what you want (and skips the header line):

for a in soup.find('pre').findChildren():
    z = a.string
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文