BeautifulSoup:使用字符串获取值

发布于 2024-11-07 06:36:43 字数 494 浏览 1 评论 0原文

是否可以使用字符串来获取标签的值?

XML 结构:

book
   title
      titletext
book
   title
      titletext

代码:

books = BeautifulStoneSoup().findAll('book')
for book in books:
    book.title.titletext.string
    #book.get_by_string('title.titletext').string is this possible?

如果不可能,getattr 支持多个级别吗?

getattr(book, 'title.titletext').string

我做了一些测试,这似乎不可能,但也许还有其他选择?

如果没有,我想我必须编写自己的递归函数来查找属性?

Is it possible to use a string to get a value of a tag?

XML structure:

book
   title
      titletext
book
   title
      titletext

Code:

books = BeautifulStoneSoup().findAll('book')
for book in books:
    book.title.titletext.string
    #book.get_by_string('title.titletext').string is this possible?

If it's not possible does getattr support multiple levels?

getattr(book, 'title.titletext').string

I did some testing and this doesn't seem to be possible but maybe there is an alternative?

If there isn't I guess I have to write my own recursive function to find the attribute?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

彼岸花似海 2024-11-14 06:36:43

我建议研究 ElementTree。它有你需要的东西。举一个简单的例子:

import xml.etree.cElementTree

doc = xml.etree.cElementTree.parse( filename )
for e in doc.getiterator( 'title' ):
    book_title = e.attrib[ 'titletext' ]

显然我没有处理错误条件,但使用 try/ except 或检查“titletext”是否在字典中就足够了。

如果您正在寻找特定的标签,而不是标签的属性,上面的代码仍然有效:

import xml.etree.cElementTree

doc = xml.etree.cElementTree.parse( filename )
for e in doc.getiterator( 'titletext' ):
    book_title = e.text

一般来说,我发现 ElementTree 比 BeautifulSoup 更容​​易使用,至少对于我使用的东西来说是这样。我发现它对于我们的案例来说稍微快一些,并且它可以更轻松地处理像您这样的案例(在我看来)。

HTH。

I would suggest looking into ElementTree. It has what you need. As a quick example:

import xml.etree.cElementTree

doc = xml.etree.cElementTree.parse( filename )
for e in doc.getiterator( 'title' ):
    book_title = e.attrib[ 'titletext' ]

Obviously I'm not handling error conditions, but using try/except or checking to see if 'titletext' is in the dict is sufficient.

If you are looking for a specific tag, and not an attribute of the tag, the above code will still work:

import xml.etree.cElementTree

doc = xml.etree.cElementTree.parse( filename )
for e in doc.getiterator( 'titletext' ):
    book_title = e.text

In general, I've found ElementTree easier to work with than BeautifulSoup, at least for the kinds of things that I work with. I've found that it's slightly faster for our cases and it handles cases like yours more easily (in my opinion).

HTH.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文