访问 BeautifulSoup 中的属性时出现问题

发布于 2024-11-04 17:48:24 字数 579 浏览 4 评论 0原文

我在使用 Python (2.7) 时遇到问题。代码基本上包括:

str = '<el at="some">ABC</el><el>DEF</el>'
z = BeautifulStoneSoup(str)

for x in z.findAll('el'):
    # if 'at' in x:
    # if hasattr(x, 'at'):
        print x['at']   
    else:
        print 'nothing'

我希望第一个 if 语句能够正常工作(即:如果 at 不存在,则打印 "nothing" ),但它总是不打印任何内容(即:始终为False)。另一方面,第二个 if 始终为 True,这将导致代码在尝试访问 at< 时引发 KeyError /code> 来自第二个 元素,这当然不存在。

I am having problems using Python (2.7). The code basically consists of:

str = '<el at="some">ABC</el><el>DEF</el>'
z = BeautifulStoneSoup(str)

for x in z.findAll('el'):
    # if 'at' in x:
    # if hasattr(x, 'at'):
        print x['at']   
    else:
        print 'nothing'

I expected the first if statement to work correctly (ie: if at doesn't exist, print "nothing"), but it always prints nothing (ie: is always False). The second if on the other hand is always True, which will cause the code to raise a KeyError when trying to access at from the second <el> element, which of course doesn't exist.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

清风不识月 2024-11-11 17:48:24

in 运算符用于序列和映射类型,是什么让您认为 BeautifulSoup 返回的对象应该正确实现它?根据 BeautifulSoup 文档,您应该使用 [] 语法访问属性。

关于hasattr,我认为您混淆了 HTML/XML 属性和 Python 对象属性。 hasattr 适用于后者,并且 BeaitufulSoup AFAIK 不会反映它在自己的对象属性中解析的 HTML/XML 属性。

PS 请注意,BeautifulSoup 中的 Tag 对象 确实 实现了 __contains__ - 所以也许您正在尝试使用错误的对象?您能展示一个完整但最小的示例来说明该问题吗?


运行这个:

from BeautifulSoup import BeautifulSoup

str = '<el at="some">ABC</el><el>DEF</el>'
z = BeautifulSoup(str)

for x in z.findAll('el'):
    print type(x)
    print x['at']

我得到:

<class 'BeautifulSoup.Tag'>
some
<class 'BeautifulSoup.Tag'>
Traceback (most recent call last):
  File "soup4.py", line 8, in <module>
    print x['at']
  File "C:\Python26\lib\site-packages\BeautifulSoup.py", line 601, in __getitem__
    return self._getAttrMap()[key]
KeyError: 'at'

这就是我所期望的。第一个 el 有一个 at 属性,第二个没有 - 这会抛出一个 KeyError


更新 2:BeautifulSoup.Tag.__contains__ 查看标签的内容,而不是其属性。要检查属性是否存在,请使用 in

The in operator is for sequence and mapping types, what makes you think the object returned by BeautifulSoup is supposed to implement it correctly? According to the BeautifulSoup docs, you should access attributes using the [] syntax.

Re hasattr, I think you confused HTML/XML attributes and Python object attributes. hasattr is for the latter, and BeaitufulSoup AFAIK doesn't reflect the HTML/XML attributes it parsed in its own object attributes.

P.S. note that the Tag object in BeautifulSoup does implement __contains__ - so maybe you're trying with the wrong object? Can you show a complete but minimal example that demonstrates the problem?


Running this:

from BeautifulSoup import BeautifulSoup

str = '<el at="some">ABC</el><el>DEF</el>'
z = BeautifulSoup(str)

for x in z.findAll('el'):
    print type(x)
    print x['at']

I get:

<class 'BeautifulSoup.Tag'>
some
<class 'BeautifulSoup.Tag'>
Traceback (most recent call last):
  File "soup4.py", line 8, in <module>
    print x['at']
  File "C:\Python26\lib\site-packages\BeautifulSoup.py", line 601, in __getitem__
    return self._getAttrMap()[key]
KeyError: 'at'

Which is what I expected. The first el has a at attribute, the second doesn't - and this throws a KeyError.


Update 2: the BeautifulSoup.Tag.__contains__ looks inside the contents of the tag, not its attributes. To check if an attribute exists use in.

孤芳又自赏 2024-11-11 17:48:24

如果您的代码与您提供的一样简单,您可以通过以下方式以紧凑的方式解决它:

for x in z.findAll('el'):
    print x.get('at', 'nothing')

If your code is as simple as you provided, you can solve it in a compact way with:

for x in z.findAll('el'):
    print x.get('at', 'nothing')
拒绝两难 2024-11-11 17:48:24

要仅按标签名称扫描元素,pyparsing 解决方案可能更具可读性(并且不使用已弃用的 API,如 has_key):

from pyparsing import makeXMLTags

# makeXMLTags creates a pyparsing expression that matches tags with
# variations in whitespace, attributes, etc.
el,elEnd = makeXMLTags('el')

# scan the input text and work with elTags
for elTag, tagstart, tagend in el.scanString(xmltext):
    if elTag.at:
        print elTag.at

为了进行额外的改进,pyparsing 允许您定义过滤解析操作,以便仅当找到特定属性值(或属性任意值)时,标签才会匹配:

# import parse action that will filter by attribute
from pyparsing import withAttribute

# only match el tags having the 'at' attribute, with any value
el.setParseAction(withAttribute(at=withAttribute.ANY_VALUE))

# now loop again, but no need to test for presence of 'at'
# attribute - there will be no match if 'at' is not present
for elTag, tagstart, tagend in el.scanString(xmltext):
    print elTag.at

To just scan for an element by tag name, a pyparsing solution might be more readable (and without using deprecated API's like has_key):

from pyparsing import makeXMLTags

# makeXMLTags creates a pyparsing expression that matches tags with
# variations in whitespace, attributes, etc.
el,elEnd = makeXMLTags('el')

# scan the input text and work with elTags
for elTag, tagstart, tagend in el.scanString(xmltext):
    if elTag.at:
        print elTag.at

For an added refinement, pyparsing allows you to define a filtering parse action so that tags will only match if a particular attribute-value (or attribute-anyvalue) is found:

# import parse action that will filter by attribute
from pyparsing import withAttribute

# only match el tags having the 'at' attribute, with any value
el.setParseAction(withAttribute(at=withAttribute.ANY_VALUE))

# now loop again, but no need to test for presence of 'at'
# attribute - there will be no match if 'at' is not present
for elTag, tagstart, tagend in el.scanString(xmltext):
    print elTag.at
别理我 2024-11-11 17:48:24

我通常使用 get() 方法来访问属性

link = soup.find('a')
href = link.get('href')
name = link.get('name')

if name:
    print 'anchor'
if href:
    print 'link'

I usually use the get() method for accessing attribute

link = soup.find('a')
href = link.get('href')
name = link.get('name')

if name:
    print 'anchor'
if href:
    print 'link'
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文