如何使用美丽的汤解析表内表？

发布于 2024-10-02 11:15:24 字数 972 浏览 0 评论 0原文

我试过这个： s = soup.findAll("table", {"class": "view"}) 但它给了我桌子。但我需要表内表。

<table class="view" >
    <tr>
        <td width="46%" valign="top">
        <table>
    <tr>
        <td>
            <div style="adasdasd">
                <div class="abc">dasdsadasdasdas</div>
            </div>
            <div>
                <span><span class="aaaaaaa " title="aaaaaaaaaaa"><span>aaaaaaaaaaaaa</span></span> </span>
                <b>My Face</b><br />
                    Hello This is me,
                </div>
            <div class="abc"">
                    Dec 6, 2010 by Alis
                </div>
        </td>
    </tr>
        </table>
    </tr>
    </table>

The things I want to scrap is:

    Hello This is me,

    My Face

    Dec 6, 2010 by Alis

原文

I tried this:
s = soup.findAll("table", {"class": "view"}) But it is giving me the table. But I need the table inside table.

<table class="view" >
    <tr>
        <td width="46%" valign="top">
        <table>
    <tr>
        <td>
            <div style="adasdasd">
                <div class="abc">dasdsadasdasdas</div>
            </div>
            <div>
                <span><span class="aaaaaaa " title="aaaaaaaaaaa"><span>aaaaaaaaaaaaa</span></span> </span>
                <b>My Face</b><br />
                    Hello This is me,
                </div>
            <div class="abc"">
                    Dec 6, 2010 by Alis
                </div>
        </td>
    </tr>
        </table>
    </tr>
    </table>

The things I want to scrap is:

    Hello This is me,

    My Face

    Dec 6, 2010 by Alis

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

流云如水 2024-10-09 11:15:24

s = soup.findAll("table", {"class": "view"})[0].find("table")

如果只有一个表，您也可以对第一个表使用 .find，然后删除 [0]。

s = soup.findAll("table", {"class": "view"})[0].find("table")

If there's just the one table, you could use .find for the first one too, and drop the [0].

回复收藏 0 原文

优雅的叶子 2024-10-09 11:15:24

这是一些格式更好的 html：

<table class="view" >
    <tr>
        <td width="46%" valign="top">
            <table>
                <tr>
                    <td>
                        <div style="adasdasd">
                            <div class="abc">dasdsadasdasdas</div>
                        </div>
                        <div>
                            <span>
                                <span class="aaaaaaa " title="aaaaaaaaaaa">
                                    <span>aaaaaaaaaaaaa</span>
                                </span>
                            </span>
                            <b>My Face</b>
                            <br />
                            Hello This is me,
                        </div>
                        <div class="abc">
                            Dec 6, 2010 by Alis
                        </div>
                    </td>
                </tr>
            </table>
        </td>
    </tr>
</table>

注意：我实际上添加了一个标签，因为它缺少一个。

innerTable = soup.find("table", {"class": "view"}).tr.td.table ##Gets the table in the first cell of the first row

innerDiv = innerTable.find("div", {"style": "adasdasd"}).nextSibling #this gets the div in which all of you content resides

这样您就可以找到包含所有内容的地方。从那里只需进行一点解析即可获取您实际需要的内容。

Heres some better formatted html:

<table class="view" >
    <tr>
        <td width="46%" valign="top">
            <table>
                <tr>
                    <td>
                        <div style="adasdasd">
                            <div class="abc">dasdsadasdasdas</div>
                        </div>
                        <div>
                            <span>
                                <span class="aaaaaaa " title="aaaaaaaaaaa">
                                    <span>aaaaaaaaaaaaa</span>
                                </span>
                            </span>
                            <b>My Face</b>
                            <br />
                            Hello This is me,
                        </div>
                        <div class="abc">
                            Dec 6, 2010 by Alis
                        </div>
                    </td>
                </tr>
            </table>
        </td>
    </tr>
</table>

Note: I actually added a tag because it was missing one.

innerTable = soup.find("table", {"class": "view"}).tr.td.table ##Gets the table in the first cell of the first row

innerDiv = innerTable.find("div", {"style": "adasdasd"}).nextSibling #this gets the div in which all of you content resides

So that will get you to that that holds all of your content. From there it's just a little bit of parsing to get the content you actually need.

回复收藏 0 原文

~没有更多了~