使用 beautifulsoup,如何在 html 页面中引用表行

发布于 2024-09-12 02:16:55 字数 622 浏览 3 评论 0原文

我有一个 html 页面,如下所示:

    <html>

    ..

    <form post="/products.hmlt" ..>
    ..

    <table ...>
    <tr>...</tr>
    <tr>
       <td>part info</td>
    ..
    </tr>

    </table>

    ..


</form>

..

</html>

我尝试过:

form = soup.findAll('form')

table = form.findAll('table')  # table inside form

但我收到一条错误消息:

ResultSet 对象没有属性 'findAll'

我猜对 findAll 的调用不会返回 'beautifulsoup' 对象?那我该怎么办?

更新

此页面上有很多表格,但上面显示的标签内只有 1 个表格。

I have a html page that looks like:

    <html>

    ..

    <form post="/products.hmlt" ..>
    ..

    <table ...>
    <tr>...</tr>
    <tr>
       <td>part info</td>
    ..
    </tr>

    </table>

    ..


</form>

..

</html>

I tried:

form = soup.findAll('form')

table = form.findAll('table')  # table inside form

But I get an error saying:

ResultSet object has no attribute 'findAll'

I guess the call to findAll doesn't return a 'beautifulsoup' object? what can I do then?

Update

There are many tables on this page, but only 1 table INSIDE the tag shown above.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

最冷一天 2024-09-19 02:16:55

findAll 返回一个列表,因此首先提取元素:

form = soup.findAll('form')[0]
table = form.findAll('table')[0]  # table inside form

当然,您应该在索引到列表之前进行一些错误检查(即确保它不为空)。

findAll returns a list, so extract the element first:

form = soup.findAll('form')[0]
table = form.findAll('table')[0]  # table inside form

Of course, you should do some error checking (i.e. make sure it's not empty) before indexing into the list.

梦里梦着梦中梦 2024-09-19 02:16:55

我喜欢 ars 的答案,并且当然同意错误检查的需要;
特别是如果这将用于任何类型的生产代码。

这也许是一种更详细/更明确的方式来查找您所寻求的数据:

from BeautifulSoup import BeautifulSoup as bs
html = '''<html><body><table><tr><td>some text</td></tr></table>
    <form><table><tr><td>some text we care about</td></tr>
    <tr><td>more text we care about</td></tr>
    </table></form></html></body>'''    
soup = bs(html)

for tr in soup.form.findAll('tr'):
    print tr.text
# output:
# some text we care about
# more text we care about

作为参考,这里是清理后的 HTML:

>>> print soup.prettify()
<html>
 <body>
  <table>
   <tr>
    <td>
     some text
    </td>
   </tr>
  </table>
  <form>
   <table>
    <tr>
     <td>
      some text we care about
     </td>
    </tr>
    <tr>
     <td>
      more text we care about
     </td>
    </tr>
   </table>
  </form>
 </body>
</html>

I like ars's answer, and certainly agree w/ the need for error-checking;
especially if this is going to be used in any kind of production code.

Here's perhaps a more verbose / explicit way of finding the data you seek:

from BeautifulSoup import BeautifulSoup as bs
html = '''<html><body><table><tr><td>some text</td></tr></table>
    <form><table><tr><td>some text we care about</td></tr>
    <tr><td>more text we care about</td></tr>
    </table></form></html></body>'''    
soup = bs(html)

for tr in soup.form.findAll('tr'):
    print tr.text
# output:
# some text we care about
# more text we care about

For reference here is the cleaned-up HTML:

>>> print soup.prettify()
<html>
 <body>
  <table>
   <tr>
    <td>
     some text
    </td>
   </tr>
  </table>
  <form>
   <table>
    <tr>
     <td>
      some text we care about
     </td>
    </tr>
    <tr>
     <td>
      more text we care about
     </td>
    </tr>
   </table>
  </form>
 </body>
</html>
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文