Python 中的 DOM 操作(如果某个元素仅包含另一个元素......)

发布于 2024-12-07 19:00:59 字数 571 浏览 1 评论 0 原文

我需要删除所有不需要的

。如将

xxxx

转换为
xxxx

我怎样才能用 DOM 做到这一点? “如果

内部只有一个

,则将该

的文本分配给

并删除此

”。

我更愿意用正则表达式来做这件事,但有些人说这很糟糕。我无法想象它是如何用 DOM 完成的。

text = "<div><p>xxxx</p></div>"
???

是否可以用 DOM 来解决?或者好的旧正则表达式更适合这种情况?
Python,而不是 JavaScript。

I need to remove all <p>s where they are of no need. Such as convert <div><p>xxxx</p></div> to <div>xxxx</div>.

How can I do it with DOM? "If <div> has only one <p> inside, than assign that <p>'s text to <div> and remove this <p>".

I'd rether do it with regex, but some people say that it is bad. I can't imagine how it is done with DOM.

text = "<div><p>xxxx</p></div>"
???

Is it possible to solve with DOM at all? Or good old regex is better for this case?
Python, not JavaScript.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

旧人哭 2024-12-14 19:00:59

这对我有用:

from xml.dom import minidom

text = "<div><p>xxxx</p></div>"
doc = minidom.parseString(text)

# For each div in the root document
for tag in doc.childNodes:
    # If it's a <p> and there's only one
    if len(tag.childNodes) == 1 and tag.childNodes[0].tagName == 'p':
        # p_node = <p>xxx</p>
        p_node = tag.childNodes[0]
        # p_text_node = xxx
        p_text_node = p_node.childNodes[0]
        value = p_node.nodeValue
        # Delete the <p>xxx</p>
        p_node.parentNode.removeChild(p_node)
        # Set the <div></div> -> <div>xxx</div>
        tag.appendChild(p_text_node)

print doc.toxml()

并且产生:

<?xml version="1.0" ?><div>xxxx</div>

我希望您也能接受我为您的其他问题给出的答案,因为我为您投入了所有工作;)

This works for me:

from xml.dom import minidom

text = "<div><p>xxxx</p></div>"
doc = minidom.parseString(text)

# For each div in the root document
for tag in doc.childNodes:
    # If it's a <p> and there's only one
    if len(tag.childNodes) == 1 and tag.childNodes[0].tagName == 'p':
        # p_node = <p>xxx</p>
        p_node = tag.childNodes[0]
        # p_text_node = xxx
        p_text_node = p_node.childNodes[0]
        value = p_node.nodeValue
        # Delete the <p>xxx</p>
        p_node.parentNode.removeChild(p_node)
        # Set the <div></div> -> <div>xxx</div>
        tag.appendChild(p_text_node)

print doc.toxml()

and yields:

<?xml version="1.0" ?><div>xxxx</div>

I hope you'll accept the answer I gave for your other question too since I put in all the work for you ;)

弥枳 2024-12-14 19:00:59

您可以使用 BeautifulSoup 执行此操作:

>>> import BeautifulSoup
>>> somehtml = '<html><title>hey</title><body><p>blah</p><div><p>something</p></div></body></html>'
>>> soup = BeautifulSoup.BeautifulSoup(somehtml)
>>> for p in soup.findAll('p'):
...    if p.parent.string is None and len(p.parent.contents) == 1:
...       p.parent.string = p.string
...       p.extract()
>>> soup
<html><title>hey</title><body><p>blah</p><div>something</div></body></html>

这将搜索所有 元素的父元素没有内容,只有一个子元素(

元素),然后复制

元素添加到父级并删除

元素。

Here's a way you can do it using BeautifulSoup:

>>> import BeautifulSoup
>>> somehtml = '<html><title>hey</title><body><p>blah</p><div><p>something</p></div></body></html>'
>>> soup = BeautifulSoup.BeautifulSoup(somehtml)
>>> for p in soup.findAll('p'):
...    if p.parent.string is None and len(p.parent.contents) == 1:
...       p.parent.string = p.string
...       p.extract()
>>> soup
<html><title>hey</title><body><p>blah</p><div>something</div></body></html>

This searches for all <p> elements that have a parent with no content and only one child (the <p> element), then copies the contents of the <p> element to the parent and removes the <p> element.

恋你朝朝暮暮 2024-12-14 19:00:59

基于 @jterrace 答案:(

请编辑此问题,使其完整,或发表评论)

我认为解决方法是创建一个 minidom.Document ,以便您可以修改其 xml 节点。

#coding: utf-8

from xml.dom import minidom

text = "<div><p>xxxx</p></div>"

dom = minidom.parseString(text)

for p in dom.getElementsByTagName('p'):
    print p.childNodes
    # and what now?

Building upon @jterrace answer:

(PLEASE EDIT THIS QUESTION SO THAT IT IS COMPLETE, OR COMMENT)

I think the way to go is to create a minidom.Document so that you can modify its xml nodes.

#coding: utf-8

from xml.dom import minidom

text = "<div><p>xxxx</p></div>"

dom = minidom.parseString(text)

for p in dom.getElementsByTagName('p'):
    print p.childNodes
    # and what now?
深巷少女 2024-12-14 19:00:59

如果你有jquery,这会起作用。

$('div').each(function() {

    if ($(this).children().length > 1)
        return

    if ($(this).children()[0].tagName != "P")
        return

    this.innerHTML = $(this).children()[0].innerHTML;
});

If you have jquery, this will work.

$('div').each(function() {

    if ($(this).children().length > 1)
        return

    if ($(this).children()[0].tagName != "P")
        return

    this.innerHTML = $(this).children()[0].innerHTML;
});
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文