如何使用美丽的汤解析表内表?

发布于 2024-10-02 11:15:24 字数 972 浏览 0 评论 0原文

我试过这个: s = soup.findAll("table", {"class": "view"}) 但它给了我桌子。但我需要表内表。

<table class="view" >
    <tr>
        <td width="46%" valign="top">
        <table>
    <tr>
        <td>
            <div style="adasdasd">
                <div class="abc">dasdsadasdasdas</div>
            </div>
            <div>
                <span><span class="aaaaaaa " title="aaaaaaaaaaa"><span>aaaaaaaaaaaaa</span></span> </span>
                <b>My Face</b><br />
                    Hello This is me,
                </div>
            <div class="abc"">
                    Dec 6, 2010 by Alis
                </div>
        </td>
    </tr>
        </table>
    </tr>
    </table>

The things I want to scrap is:

    Hello This is me,

    My Face

    Dec 6, 2010 by Alis

I tried this:
s = soup.findAll("table", {"class": "view"}) But it is giving me the table. But I need the table inside table.

<table class="view" >
    <tr>
        <td width="46%" valign="top">
        <table>
    <tr>
        <td>
            <div style="adasdasd">
                <div class="abc">dasdsadasdasdas</div>
            </div>
            <div>
                <span><span class="aaaaaaa " title="aaaaaaaaaaa"><span>aaaaaaaaaaaaa</span></span> </span>
                <b>My Face</b><br />
                    Hello This is me,
                </div>
            <div class="abc"">
                    Dec 6, 2010 by Alis
                </div>
        </td>
    </tr>
        </table>
    </tr>
    </table>

The things I want to scrap is:

    Hello This is me,

    My Face

    Dec 6, 2010 by Alis

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

流云如水 2024-10-09 11:15:24
s = soup.findAll("table", {"class": "view"})[0].find("table")

如果只有一个表,您也可以对第一个表使用 .find,然后删除 [0]

s = soup.findAll("table", {"class": "view"})[0].find("table")

If there's just the one table, you could use .find for the first one too, and drop the [0].

优雅的叶子 2024-10-09 11:15:24

这是一些格式更好的 html:

<table class="view" >
    <tr>
        <td width="46%" valign="top">
            <table>
                <tr>
                    <td>
                        <div style="adasdasd">
                            <div class="abc">dasdsadasdasdas</div>
                        </div>
                        <div>
                            <span>
                                <span class="aaaaaaa " title="aaaaaaaaaaa">
                                    <span>aaaaaaaaaaaaa</span>
                                </span>
                            </span>
                            <b>My Face</b>
                            <br />
                            Hello This is me,
                        </div>
                        <div class="abc">
                            Dec 6, 2010 by Alis
                        </div>
                    </td>
                </tr>
            </table>
        </td>
    </tr>
</table>

注意:我实际上添加了一个标签,因为它缺少一个。

innerTable = soup.find("table", {"class": "view"}).tr.td.table ##Gets the table in the first cell of the first row

innerDiv = innerTable.find("div", {"style": "adasdasd"}).nextSibling #this gets the div in which all of you content resides

这样您就可以找到包含所有内容的地方。从那里只需进行一点解析即可获取您实际需要的内容。

Heres some better formatted html:

<table class="view" >
    <tr>
        <td width="46%" valign="top">
            <table>
                <tr>
                    <td>
                        <div style="adasdasd">
                            <div class="abc">dasdsadasdasdas</div>
                        </div>
                        <div>
                            <span>
                                <span class="aaaaaaa " title="aaaaaaaaaaa">
                                    <span>aaaaaaaaaaaaa</span>
                                </span>
                            </span>
                            <b>My Face</b>
                            <br />
                            Hello This is me,
                        </div>
                        <div class="abc">
                            Dec 6, 2010 by Alis
                        </div>
                    </td>
                </tr>
            </table>
        </td>
    </tr>
</table>

Note: I actually added a tag because it was missing one.

innerTable = soup.find("table", {"class": "view"}).tr.td.table ##Gets the table in the first cell of the first row

innerDiv = innerTable.find("div", {"style": "adasdasd"}).nextSibling #this gets the div in which all of you content resides

So that will get you to that that holds all of your content. From there it's just a little bit of parsing to get the content you actually need.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文