当前位置：文江博客话题详情

Python DOM

Python 中的 DOM 操作（如果某个元素仅包含另一个元素......）

发布于 2024-12-07 19:00:59 字数 571 浏览 1 评论 0 原文

我需要删除所有不需要的

。如将

xxxx

转换为 xxxx。

我怎样才能用 DOM 做到这一点？ “如果

内部只有一个

，则将该

的文本分配给

并删除此

”。

我更愿意用正则表达式来做这件事，但有些人说这很糟糕。我无法想象它是如何用 DOM 完成的。

text = "<div><p>xxxx</p></div>"
???

是否可以用 DOM 来解决？或者好的旧正则表达式更适合这种情况？
Python，而不是 JavaScript。

原文

I need to remove all <p>s where they are of no need. Such as convert <div><p>xxxx</p></div> to <div>xxxx</div>.

How can I do it with DOM? "If <div> has only one <p> inside, than assign that <p>'s text to <div> and remove this <p>".

I'd rether do it with regex, but some people say that it is bad. I can't imagine how it is done with DOM.

text = "<div><p>xxxx</p></div>"
???

Is it possible to solve with DOM at all? Or good old regex is better for this case?
Python, not JavaScript.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

旧人哭 2024-12-14 19:00:59

这对我有用：

from xml.dom import minidom

text = "<div><p>xxxx</p></div>"
doc = minidom.parseString(text)

# For each div in the root document
for tag in doc.childNodes:
    # If it's a <p> and there's only one
    if len(tag.childNodes) == 1 and tag.childNodes[0].tagName == 'p':
        # p_node = <p>xxx</p>
        p_node = tag.childNodes[0]
        # p_text_node = xxx
        p_text_node = p_node.childNodes[0]
        value = p_node.nodeValue
        # Delete the <p>xxx</p>
        p_node.parentNode.removeChild(p_node)
        # Set the <div></div> -> <div>xxx</div>
        tag.appendChild(p_text_node)

print doc.toxml()

并且产生：

<?xml version="1.0" ?><div>xxxx</div>

我希望您也能接受我为您的其他问题给出的答案，因为我为您投入了所有工作；）

This works for me:

from xml.dom import minidom

text = "<div><p>xxxx</p></div>"
doc = minidom.parseString(text)

# For each div in the root document
for tag in doc.childNodes:
    # If it's a <p> and there's only one
    if len(tag.childNodes) == 1 and tag.childNodes[0].tagName == 'p':
        # p_node = <p>xxx</p>
        p_node = tag.childNodes[0]
        # p_text_node = xxx
        p_text_node = p_node.childNodes[0]
        value = p_node.nodeValue
        # Delete the <p>xxx</p>
        p_node.parentNode.removeChild(p_node)
        # Set the <div></div> -> <div>xxx</div>
        tag.appendChild(p_text_node)

print doc.toxml()

and yields:

<?xml version="1.0" ?><div>xxxx</div>

I hope you'll accept the answer I gave for your other question too since I put in all the work for you ;)

回复收藏 0 原文

弥枳 2024-12-14 19:00:59

您可以使用 BeautifulSoup 执行此操作：

>>> import BeautifulSoup
>>> somehtml = '<html><title>hey</title><body><p>blah</p><div><p>something</p></div></body></html>'
>>> soup = BeautifulSoup.BeautifulSoup(somehtml)
>>> for p in soup.findAll('p'):
...    if p.parent.string is None and len(p.parent.contents) == 1:
...       p.parent.string = p.string
...       p.extract()
>>> soup
<html><title>hey</title><body><p>blah</p><div>something</div></body></html>

这将搜索所有 元素的父元素没有内容，只有一个子元素（


 元素），然后复制
 元素添加到父级并删除 
 元素。

Here's a way you can do it using BeautifulSoup:

>>> import BeautifulSoup
>>> somehtml = '<html><title>hey</title><body><p>blah</p><div><p>something</p></div></body></html>'
>>> soup = BeautifulSoup.BeautifulSoup(somehtml)
>>> for p in soup.findAll('p'):
...    if p.parent.string is None and len(p.parent.contents) == 1:
...       p.parent.string = p.string
...       p.extract()
>>> soup
<html><title>hey</title><body><p>blah</p><div>something</div></body></html>

This searches for all <p> elements that have a parent with no content and only one child (the <p> element), then copies the contents of the <p> element to the parent and removes the <p> element.

回复收藏 0 原文

恋你朝朝暮暮 2024-12-14 19:00:59

基于 @jterrace 答案：（

请编辑此问题，使其完整，或发表评论）

我认为解决方法是创建一个 minidom.Document ，以便您可以修改其 xml 节点。

#coding: utf-8

from xml.dom import minidom

text = "<div><p>xxxx</p></div>"

dom = minidom.parseString(text)

for p in dom.getElementsByTagName('p'):
    print p.childNodes
    # and what now?

Building upon @jterrace answer:

(PLEASE EDIT THIS QUESTION SO THAT IT IS COMPLETE, OR COMMENT)

I think the way to go is to create a minidom.Document so that you can modify its xml nodes.

#coding: utf-8

from xml.dom import minidom

text = "<div><p>xxxx</p></div>"

dom = minidom.parseString(text)

for p in dom.getElementsByTagName('p'):
    print p.childNodes
    # and what now?

回复收藏 0 原文

深巷少女 2024-12-14 19:00:59

如果你有jquery，这会起作用。

$('div').each(function() {

    if ($(this).children().length > 1)
        return

    if ($(this).children()[0].tagName != "P")
        return

    this.innerHTML = $(this).children()[0].innerHTML;
});

If you have jquery, this will work.

$('div').each(function() {

    if ($(this).children().length > 1)
        return

    if ($(this).children()[0].tagName != "P")
        return

    this.innerHTML = $(this).children()[0].innerHTML;
});

回复收藏 0 原文

~没有更多了~