BeautifulSoup - 帮我挑选 div 和类

发布于 2024-11-01 15:45:16 字数 1025 浏览 1 评论 0原文

这是我的 HMTL 代码:

<div class="BlockA">
    <h4>BlockA</h4>
    <div class="name">John Smith</div>
    <div class="number">2</div>
    <div class="name">Paul Peterson</div>
    <div class="number">14</div>
</div>

<div class="BlockB">
    <h4>BlockB</h4>
    <div class="name">Steve Jones</div>
    <div class="number">5</div>
</div>

注意 BlockABlockB。两者都包含相同的元素,即 namenumber,但位于不同的类中。我是 python 新手,正在考虑尝试类似的方法:

parsedHTML = soup.findAll("div", attrs={"name" : "number"})

但这只会给我一个空白屏幕。我是否可以从 blockA 中执行 findAll,显示数据,然后从 BlockB 启动另一个循环并执行相同操作?

谢谢。

编辑:对于那些询问的人,我想简单地循环遍历 JSON 中的值和输出,如下所示:

BlockA
    John Smith
    2
    Paul Peterson
    14

BlockB
    Steve Whoever
    123
    Mr Whathisface
    23

Heres my HMTL code:

<div class="BlockA">
    <h4>BlockA</h4>
    <div class="name">John Smith</div>
    <div class="number">2</div>
    <div class="name">Paul Peterson</div>
    <div class="number">14</div>
</div>

<div class="BlockB">
    <h4>BlockB</h4>
    <div class="name">Steve Jones</div>
    <div class="number">5</div>
</div>

Notice BlockA and BlockB. Both contain the same elements, ie name and number but are inside seperate classes. I'm new to python and was thinking of trying something like:

parsedHTML = soup.findAll("div", attrs={"name" : "number"})

but that just gives me a blank screen. Is it possible for me to do a findAll from within blockA, display the data, then start another loop from BlockB and do the same?

Thanks.

EDIT: For those asking, I want to simply loop through the values and output in JSON like this:

BlockA
    John Smith
    2
    Paul Peterson
    14

BlockB
    Steve Whoever
    123
    Mr Whathisface
    23

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

魂ガ小子 2024-11-08 15:45:16

您想查找包含“名称”或“数字”类属性的 div 吗?

>>> import re
>>> soup.findAll("div", {"class":re.compile("name|number")})

[<div class="name">John Smith</div>, <div class="number">2</div>, <div class="name">Paul Peterson</div>, <div class="number">14</div>, <div class="name">Steve Jones</div>, <div class="number">5</div>]

You want to find divs that contain a class attribute of "name" or "number"?

>>> import re
>>> soup.findAll("div", {"class":re.compile("name|number")})

[<div class="name">John Smith</div>, <div class="number">2</div>, <div class="name">Paul Peterson</div>, <div class="number">14</div>, <div class="name">Steve Jones</div>, <div class="number">5</div>]
倾其所爱 2024-11-08 15:45:16

您需要使用可能的 class 值的列表。

soup.findAll('div', {'class': ['name', 'number']})

看到您的编辑后:

def grab_content(heading):
    siblings = [s.contents[0] for s in heading.findNextSiblings()]
    return {heading.contents[0]: siblings}

headings = soup.findAll('h4')
[grab_content(h) for h in headings]

原始 HTML 片段的输出将是:

[{u'BlockA': [u'John Smith', u'2', u'Paul Peterson', u'14']},
 {u'BlockB': [u'Steve Jones', u'5']}]

You need to use a list of possible class values.

soup.findAll('div', {'class': ['name', 'number']})

After seeing your edit:

def grab_content(heading):
    siblings = [s.contents[0] for s in heading.findNextSiblings()]
    return {heading.contents[0]: siblings}

headings = soup.findAll('h4')
[grab_content(h) for h in headings]

And the output for your original HTML snippet would be:

[{u'BlockA': [u'John Smith', u'2', u'Paul Peterson', u'14']},
 {u'BlockB': [u'Steve Jones', u'5']}]
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文