BeautifulSoup 中的字典索引和“if x in Dict”

发布于 2024-12-02 00:04:13 字数 513 浏览 0 评论 0原文

我认为我不明白如何检查数组索引是否存在...

for tag in soup.findAll("input"):
            print tag['type']
            if 'type' in tag:
                print "b"

输出:

2255
text
hidden
text
text
text
Traceback (most recent call last):
  File "/home//workspace//src/x.py", line 268, in <module>
    print tag['type']
  File "/home//workspace//src/BeautifulSoup.py", line 601, in __getitem__
    return self._getAttrMap()[key]
KeyError: 'type'

为什么它不输出“b”?

I don't think I understand how to check if an array index exists...

for tag in soup.findAll("input"):
            print tag['type']
            if 'type' in tag:
                print "b"

Outputs:

2255
text
hidden
text
text
text
Traceback (most recent call last):
  File "/home//workspace//src/x.py", line 268, in <module>
    print tag['type']
  File "/home//workspace//src/BeautifulSoup.py", line 601, in __getitem__
    return self._getAttrMap()[key]
KeyError: 'type'

Why is it not outputting 'b' ever?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

×纯※雪 2024-12-09 00:04:13

BeautifulSoup Tag 不是 dict。有时它在某些方面表现得像一个(您发现 [] 表示法获取属性的值),但在其他方面却不然。 Tag 上的 in 将检查标签是否是该标签的直接子代;它不检查属性。

相反,你可以这样做:

if not tag.get('type', None):
    pass # type is empty or nonexistent

A BeautifulSoup Tag is not a dict. Sometimes it acts like one in certain ways ([] notation as you discovered gets the value of an attribute), but in other ways it doesn't. in on a Tag will check if a tag is a direct child of that tag; it does not check attributes.

Instead, you could do something like this:

if not tag.get('type', None):
    pass # type is empty or nonexistent
你げ笑在眉眼 2024-12-09 00:04:13

为什么它不输出“b”?

您假设从 findAll 返回的标签是字典,但实际上它们不是。您使用的 BeautifulSoup 库有自己的自定义类,在本例中为 BeautifulSoup.Tag,它的工作方式可能很像字典,但事实并非如此。

在这里,检查一下:

    >>> doc = ['<html><head><title>Page title</title></head>',
    ...        '<body><p id="firstpara" align="center">This is paragraph <b>one</b>.', 
    ...        '<p id="secondpara" align="blah">This is paragraph <b>two</b>.',
    ...        '</html>']
    >>> soup = BeautifulSoup(''.join(doc))
    >>> tag = soup.findAll("p")[0]
    >>> type(tag)
    class 'BeautifulSoup.Tag'>
    >>> isinstance(tag, dict)
    False

由于它实际上不是一个字典,因此您会得到一些不同的(特定于域的)行为,在本例中是直接子级列表(立即包含在您要“索引”的标签中的标签)。

看起来您想知道 input 标签是否有属性 type,因此根据 BeautifulSoup 文档,您可以使用 tag.attrs 和 attrMap 列出标签的属性。

    >>> tag.attrs
    [(u'id', u'firstpara'), (u'align', u'center')]
    >>> tag.attrMap
    {u'align': u'center', u'id': u'firstpara'}
    >>> 'id' in tag.attrMap
    True

BeautifulSoup 是一个非常有用的库,但您必须稍微使用它才能获得您想要的结果。确保花时间在交互式控制台中使用这些类,并记住使用 help(someobject) 语法来查看您正在使用的内容以及它具有哪些方法。

Why is it not outputting 'b' ever?

You're assuming that the tags returned from findAll are dicts, when in fact they're not. The BeautifulSoup library that you're using has its own custom classes, in this case BeautifulSoup.Tag, which may work a lot like a dict, but isn't.

Here, check this out:

    >>> doc = ['<html><head><title>Page title</title></head>',
    ...        '<body><p id="firstpara" align="center">This is paragraph <b>one</b>.', 
    ...        '<p id="secondpara" align="blah">This is paragraph <b>two</b>.',
    ...        '</html>']
    >>> soup = BeautifulSoup(''.join(doc))
    >>> tag = soup.findAll("p")[0]
    >>> type(tag)
    class 'BeautifulSoup.Tag'>
    >>> isinstance(tag, dict)
    False

Since it's not actually a dict, you're getting some different (domain-specific) behavior, in this case a list of immediate children (tags immediately contained within the tag you're "indexing").

It looks like you want to know if the input tag has an attribute type, so according to the BeautifulSoup documentation you can list the attributes of a tag using tag.attrs and attrMap.

    >>> tag.attrs
    [(u'id', u'firstpara'), (u'align', u'center')]
    >>> tag.attrMap
    {u'align': u'center', u'id': u'firstpara'}
    >>> 'id' in tag.attrMap
    True

BeautifulSoup is a really helpful library, but it's one that you have to play with a bit to get the results you want. Make sure to spend time in the interactive console playing with the classes, and remember to use the help(someobject) syntax to see what you're playing with and what methods it has.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文