访问 BeautifulSoup 中的属性时出现问题
我在使用 Python (2.7) 时遇到问题。代码基本上包括:
str = '<el at="some">ABC</el><el>DEF</el>'
z = BeautifulStoneSoup(str)
for x in z.findAll('el'):
# if 'at' in x:
# if hasattr(x, 'at'):
print x['at']
else:
print 'nothing'
我希望第一个 if
语句能够正常工作(即:如果 at
不存在,则打印 "nothing"
),但它总是不打印任何内容(即:始终为False
)。另一方面,第二个 if
始终为 True
,这将导致代码在尝试访问 at< 时引发
KeyError
/code> 来自第二个
元素,这当然不存在。
I am having problems using Python (2.7). The code basically consists of:
str = '<el at="some">ABC</el><el>DEF</el>'
z = BeautifulStoneSoup(str)
for x in z.findAll('el'):
# if 'at' in x:
# if hasattr(x, 'at'):
print x['at']
else:
print 'nothing'
I expected the first if
statement to work correctly (ie: if at
doesn't exist, print "nothing"
), but it always prints nothing (ie: is always False
). The second if
on the other hand is always True
, which will cause the code to raise a KeyError
when trying to access at
from the second <el>
element, which of course doesn't exist.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
in
运算符用于序列和映射类型,是什么让您认为BeautifulSoup
返回的对象应该正确实现它?根据 BeautifulSoup 文档,您应该使用[]
语法访问属性。关于hasattr,我认为您混淆了 HTML/XML 属性和 Python 对象属性。
hasattr
适用于后者,并且 BeaitufulSoup AFAIK 不会反映它在自己的对象属性中解析的 HTML/XML 属性。PS 请注意,
BeautifulSoup
中的Tag
对象 确实 实现了__contains__
- 所以也许您正在尝试使用错误的对象?您能展示一个完整但最小的示例来说明该问题吗?运行这个:
我得到:
这就是我所期望的。第一个
el
有一个at
属性,第二个没有 - 这会抛出一个KeyError
。更新 2:
BeautifulSoup.Tag.__contains__
查看标签的内容,而不是其属性。要检查属性是否存在,请使用in
。The
in
operator is for sequence and mapping types, what makes you think the object returned byBeautifulSoup
is supposed to implement it correctly? According to the BeautifulSoup docs, you should access attributes using the[]
syntax.Re
hasattr
, I think you confused HTML/XML attributes and Python object attributes.hasattr
is for the latter, and BeaitufulSoup AFAIK doesn't reflect the HTML/XML attributes it parsed in its own object attributes.P.S. note that the
Tag
object inBeautifulSoup
does implement__contains__
- so maybe you're trying with the wrong object? Can you show a complete but minimal example that demonstrates the problem?Running this:
I get:
Which is what I expected. The first
el
has aat
attribute, the second doesn't - and this throws aKeyError
.Update 2: the
BeautifulSoup.Tag.__contains__
looks inside the contents of the tag, not its attributes. To check if an attribute exists usein
.如果您的代码与您提供的一样简单,您可以通过以下方式以紧凑的方式解决它:
If your code is as simple as you provided, you can solve it in a compact way with:
要仅按标签名称扫描元素,pyparsing 解决方案可能更具可读性(并且不使用已弃用的 API,如
has_key
):为了进行额外的改进,pyparsing 允许您定义过滤解析操作,以便仅当找到特定属性值(或属性任意值)时,标签才会匹配:
To just scan for an element by tag name, a pyparsing solution might be more readable (and without using deprecated API's like
has_key
):For an added refinement, pyparsing allows you to define a filtering parse action so that tags will only match if a particular attribute-value (or attribute-anyvalue) is found:
我通常使用 get() 方法来访问属性
I usually use the get() method for accessing attribute