用 BeautifulSoup 替换 html 标签

发布于 2024-09-26 23:27:06 字数 501 浏览 0 评论 0原文

我目前正在使用 BeautifulSoup 重新格式化一些 HTML 页面，但遇到了一些问题。

我的问题是原始 HTML 有这样的内容：

<li><p>stff</p></li>

以及

<li><div><p>Stuff</p></div></li>

使用

<li><div><p><strong>stff</strong></p></div><li>

BeautifulSoup 我希望消除 div 和 p 标签（如果存在），但保留 Strong 标签。

我正在浏览漂亮的汤文档，但找不到任何内容。有想法吗？

谢谢。

原文

I'm currently reformatting some HTML pages with BeautifulSoup, and I ran into bit of a problem.

My problem is that the original HTML has things like this:

<li><p>stff</p></li>

and

<li><div><p>Stuff</p></div></li>

as well as

<li><div><p><strong>stff</strong></p></div><li>

With BeautifulSoup I hope to eliminate the div and the p tags, if they exists, but keep the strong tag.

I'm looking through the beautiful soup documentation and couldn't find any.
Ideas?

Thanks.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

白况 2024-10-03 23:27:06

这个问题可能涉及旧版本的 BeautifulSoup 因为使用 bs4 你可以简单地使用展开函数：

s = BeautifulSoup('<li><div><p><strong>stff</strong></p></div><li>')
s.div.unwrap()
>> <div></div>
s.p.unwrap()
>> <p></p>
s
>> <html><body><li><strong>stff</strong></li><li></li></body></html>

This question probably refered to an older version of BeautifulSoup because with bs4 you can simply use the unwrap function:

s = BeautifulSoup('<li><div><p><strong>stff</strong></p></div><li>')
s.div.unwrap()
>> <div></div>
s.p.unwrap()
>> <p></p>
s
>> <html><body><li><strong>stff</strong></li><li></li></body></html>

回复收藏 0 原文

风透绣罗衣 2024-10-03 23:27:06

您想要做的事情可以使用 replaceWith 来完成。您必须复制要用作替换的元素，然后将其作为参数提供给 replaceWith。 replaceWith 的文档非常清楚如何做到这一点。

回复收藏 0 原文

娇俏 2024-10-03 23:27:06

我看到了这个简单问题的很多答案，我也来这里看看一些有用的东西，但不幸的是我没有得到我正在寻找的东西，然后经过几次尝试我找到了这个答案的简单解决方案，这里是

soup = BeautifulSoup(htmlData, "html.parser")

h2_headers = soup.find_all("h2")

for header in h2_headers:
    header.name = "h1" # replaces h2 tag with h1

所有 h2 标签转换为h1。您只需更改名称即可转换任何标签。

I saw many answers for this simple question, i also came here to see something useful but unfortunately i didn't get what i was looking for then after few tries I found a simple solution for this answer and here it is

soup = BeautifulSoup(htmlData, "html.parser")

h2_headers = soup.find_all("h2")

for header in h2_headers:
    header.name = "h1" # replaces h2 tag with h1

All h2 tags converted to h1. You can convert any tag by just changing the name.

回复收藏 0 原文

摘星┃星的人 2024-10-03 23:27:06

您可以编写自己的函数来剥离标签：

import re

def strip_tags(string):
    return re.sub(r'<.*?>', '', string)

strip_tags("<li><div><p><strong>stff</strong></p></div><li>")
'stff'

You can write your own function to strip tags:

import re

def strip_tags(string):
    return re.sub(r'<.*?>', '', string)

strip_tags("<li><div><p><strong>stff</strong></p></div><li>")
'stff'

回复收藏 0 原文

千鲤 2024-10-03 23:27:06

简单的解决方案让整个节点意味着 div：

转换为字符串
将替换为所需的标签/字符串。
将相应的标签替换为空字符串。

通过传递给 beautifulsoup 将转换后的字符串转换为可解析的字符串

我为mint做了什么

示例：


A

**-2³¹至 2³¹-1**

sup = opt.sup 
    如果sup: //opt有sup标签那么

         //opts 转换为字符串。 
         opt = str(opts).replace("^{","^").replace("}","") //替换

         //再次从字符串转换为漂亮的字符串。
         s = BeautifulSoup(opt, 'lxml')

         //操作后退出所需变量
         opts = s.find("div", class_="col-md-12 选项")

输出：

<前><代码>-2^31 到 2^31-1
如果不进行操作，它会像这样（-231 到 231-1）

Simple solution get your whole node means div:

Convert to string
Replace <tag> with required tag/string.
Replace corresponding tag with empty string.

Convert the converted string to parsable string by passing to beautifulsoup

What I have done for mint

Example:

<div class="col-md-12 option" itemprop="text">
<span class="label label-info">A</span>

**-2<sup>31</sup> to 2<sup>31</sup>-1**

sup = opt.sup 
    if sup: //opt has sup tag then

         //opts converted to string. 
         opt = str(opts).replace("<sup>","^").replace("</sup>","") //replacing

         //again converted from string to beautiful string.
         s = BeautifulSoup(opt, 'lxml')

         //resign to required variable after manipulation
         opts = s.find("div", class_="col-md-12 option")

Output:

-2^31 to 2^31-1
without manipulation it will like this (-231 to 231-1)

回复收藏 0 原文

~没有更多了~

关于作者

失而复得

暂无简介

0 文章

0 评论

22 人气

关注发私信

友情链接

文江博客

用 BeautifulSoup 替换 html 标签

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（5）

关于作者

相关话题

热门标签

推荐作者

lioqio

Single

禾厶谷欠

alipaysp_2zg8elfGgC

qq_N6d4X7

放低过去

友情链接

用 BeautifulSoup 替换 html 标签

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（5）

关于作者

相关话题

热门标签

推荐作者

lioqio

Single

禾厶谷欠

alipaysp_2zg8elfGgC

qq_N6d4X7

放低过去

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。