从 HTML 标签中删除属性

发布于 2024-11-30 23:18:52 字数 810 浏览 0 评论 0原文

可能的重复：
php：如何从 html 标记中删除属性？
如何迭代Beautiful Soup 元素的 HTML 属性？

我有一些如下所示的 HTML：

<div class="foo">
  <p id="first">Hello, world!</p>
  <p id="second">Stack Overflow</p>
</div>

它需要返回如下：

<div>
  <p>Hello, world!</p>
  <p>Stack Overflow</p>
</div>

我更喜欢 Python 解决方案，因为我已经在需要的程序中使用 BeautifulSoup被使用不过，如果 PHP 是更好的解决方案，我愿意接受。我认为 sed 正则表达式还不够，特别是将来可能使用 <文本中的符号（我不控制输入）。

原文

Possible Duplicates:
php: how can I remove attributes from an html tag?
How do I iterate over the HTML attributes of a Beautiful Soup element?

I have some HTML like the following:

<div class="foo">
  <p id="first">Hello, world!</p>
  <p id="second">Stack Overflow</p>
</div>

And it needs to come back as this:

<div>
  <p>Hello, world!</p>
  <p>Stack Overflow</p>
</div>

I'd prefer a Python solution, as I'm already using BeautifulSoup in the program it needs to be used in. However, I'm open to PHP if that's a better solution. I don't think a sed regular expression would be enough, especially with the possible future use of the < symbol in the text (I don't control the input).

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

匿名的好友 2024-12-07 23:18:52

这也适用于 sed，
<([a-zA-Z!]+)[^>]+>
然后将其替换为第一组，例如
<\1>

回复收藏 0 原文

乖乖兔^ω^ 2024-12-07 23:18:52

在 Python 中，通过使用 Lxml 可以轻松实现这一点。

首先安装 Lxml 并尝试以下代码：

from lxml.html import tostring, fromstring

html = '''
<div class="foo">
  <p id="first">Hello, world!</p>
  <p id="second">Stack Overflow</p>
</div>'''

htmlElement = fromstring(html)
for element in htmlElement.cssselect(''):
    for key in element.keys():
        element.attrib.pop(key)

result = tostring(htmlElement)

print result

This is easily possible in Python by using Lxml.

First install Lxml and try the following code:

from lxml.html import tostring, fromstring

html = '''
<div class="foo">
  <p id="first">Hello, world!</p>
  <p id="second">Stack Overflow</p>
</div>'''

htmlElement = fromstring(html)
for element in htmlElement.cssselect(''):
    for key in element.keys():
        element.attrib.pop(key)

result = tostring(htmlElement)

print result

回复收藏 0 原文

~没有更多了~