BeautifulSoup：如何删除空表，同时保留部分空或不空的表

发布于 2025-01-01 09:00:10 字数 465 浏览 0 评论 0原文

我有一个最初在 MS Frontpage 中创建的旧网站，我正在尝试对其进行 defrontpagify。我已经编写了一个 BeautifulSoup 脚本来完成大部分工作。剩下的唯一一件事就是删除空表，例如在任何td 标记中没有文本内容或数据的表。

我遇到的问题是，到目前为止，如果至少一个表的 td 标签不包含数据，即使其他标签包含数据，我到目前为止所尝试的操作也会删除该表。这将删除整个文档中的所有表格，包括包含我想要保留的数据的表格。

tags = soup.findAll('table',text=None,recursive=True) 
[tag.extract() for tag in tags]

有什么建议如何仅删除其中没有 td 标记包含任何数据的表吗？（我不在乎它们是否包含 img 或空锚标记，只要没有文本即可）。

原文

I have an old website originally created in MS Frontpage that I'm trying to defrontpagify. I've written a BeautifulSoup script that does most of it. Only thing left is to remove empty tables, eg tables with no text content or data in any of their td tags.

The problem I'm stuck on is that what I've tried so far removes the table if at least one its td tags contains no data, even if others do. That removes all the tables in the entire document, including ones with data I want to preserve.

tags = soup.findAll('table',text=None,recursive=True) 
[tag.extract() for tag in tags]

Any suggestions how to only remove tables in which none of the td tags contain any data? (I don't care if they contain img or empty anchor tags, as long as there's no text).

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

伏妖词 2025-01-08 09:00:10

使用 .text 属性。它检索该元素内的所有文本内容（递归）。

示例：

from BeautifulSoup import BeautifulSoup as BS

html = """
<table id="empty">
  <tr><td></td></tr>
</table>

<table id="with_text">
  <tr><td>hey!</td></tr>
</table>

<table id="with_text_in_one_row">
  <tr><td></td></tr>
  <tr><td>hey!</td></tr>
</table>

<table id="no_text_but_img">
  <tr><td><img></td></tr>
</table>

<table id="no_text_but_a">
  <tr><td><a></a></td></tr>
</table>

<table id="text_in_a">
  <tr><td><a>hey!</a></td></tr>
</table>

"""

soup = BS(html)
for table in soup.findAll("table" ,text=None,recursive=True):
    if table.text:
        print table["id"]

输出：

with_text
with_text_in_one_row
text_in_a

Use the .text property. It retrieves all text content (recursive) within that element.

Example:

from BeautifulSoup import BeautifulSoup as BS

html = """
<table id="empty">
  <tr><td></td></tr>
</table>

<table id="with_text">
  <tr><td>hey!</td></tr>
</table>

<table id="with_text_in_one_row">
  <tr><td></td></tr>
  <tr><td>hey!</td></tr>
</table>

<table id="no_text_but_img">
  <tr><td><img></td></tr>
</table>

<table id="no_text_but_a">
  <tr><td><a></a></td></tr>
</table>

<table id="text_in_a">
  <tr><td><a>hey!</a></td></tr>
</table>

"""

soup = BS(html)
for table in soup.findAll("table" ,text=None,recursive=True):
    if table.text:
        print table["id"]

Outputs:

with_text
with_text_in_one_row
text_in_a

回复收藏 0 原文

~没有更多了~

关于作者

屋檐

暂无简介

文章

26 人气

关注发私信

友情链接

文江博客

BeautifulSoup：如何删除空表，同时保留部分空或不空的表

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

紫罗兰の梦幻

-2134

liuxuanli

意中人

○愚か者の日

xxhui

友情链接

BeautifulSoup：如何删除空表，同时保留部分空或不空的表

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

紫罗兰の梦幻

-2134

liuxuanli

意中人

○愚か者の日

xxhui

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。