Python xml.dom.minidom removeChild 空格问题

发布于 2024-08-20 09:57:38 字数 976 浏览 12 评论 0原文

我正在尝试将 xml 文件读入 python，从 xml 文件中提取某些元素，然后将结果写回 xml 文件（所以基本上它是没有几个元素的原始 xml 文件）。当我使用 .removeChild(source) 时，它会删除我想要删除的各个元素，但会留下空白，使文件非常不可读。我知道我仍然可以解析包含所有空格的文件，但有时我需要手动更改某些元素属性的值，这使得执行此操作变得困难（并且令人烦恼）。我当然可以手动删除空格，但如果我有几十个这样的 xml 文件，那就不太可行了。

有没有办法执行 .removeChild 并让它删除空格？

这是我的代码的样子：

dom=parse(filename)
main=dom.childNodes[0]
sources = main.getElementsByTagName("source")
for source in sources :
    name=source.getAttribute("name")
    spatialModel=source.getElementsByTagName("spatialModel")
    val1=float(spatialModel[0].getElementsByTagName("parameter")[0].getAttribute("value"))
    val2=float(spatialModel[0].getElementsByTagName("parameter")[1].getAttribute("value"))
    if angsep(val1,val2,X,Y)>=ROI :
        main.removeChild(source)
    else:
        print name,val1,val2,angsep(val1,val2,X,Y)
f=open(outfile,"write")
f.write("<?xml version=\"1.0\" ?>\n")
f.write(dom.saveXML(main))
f.close()

非常感谢您的帮助。

原文

I'm trying to read an xml file into python, pull out certain elements from the xml file and then write the results back to an xml file (so basically it's the original xml file without several elements). When I use .removeChild(source) it removes the individual elements I want to remove but leaves white space in its stead making the file very unreadable. I know I can still parse the file with all of the whitespace, but there are times when I need to manually alter the values of certain element's attributes and it makes it difficult (and annyoing) to do this. I can certainly remove the whitespace by hand but if I have dozens of these xml files that's not really feasible.

Is there a way to do .removeChild and have it remove the white space as well?

Here's what my code looks like:

dom=parse(filename)
main=dom.childNodes[0]
sources = main.getElementsByTagName("source")
for source in sources :
    name=source.getAttribute("name")
    spatialModel=source.getElementsByTagName("spatialModel")
    val1=float(spatialModel[0].getElementsByTagName("parameter")[0].getAttribute("value"))
    val2=float(spatialModel[0].getElementsByTagName("parameter")[1].getAttribute("value"))
    if angsep(val1,val2,X,Y)>=ROI :
        main.removeChild(source)
    else:
        print name,val1,val2,angsep(val1,val2,X,Y)
f=open(outfile,"write")
f.write("<?xml version=\"1.0\" ?>\n")
f.write(dom.saveXML(main))
f.close()

Thanks much for the help.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

维持三分热 2024-08-27 09:57:38

如果您安装了 PyXML，则可以使用 xml.dom.ext.PrettyPrint()

回复收藏 0 原文

烧了回忆取暖 2024-08-27 09:57:38

我不知道如何使用 xml.dom.minidom 来做到这一点，所以我只是编写了一个快速函数来读取输出文件并删除所有空白行，然后重写到一个新文件：

f = open(xmlfile).readlines()
w = open('src_model.xml','w')
empty=re.compile('^
这对我来说足够好了： ）
)
for line in open(xmlfile).readlines():
    if empty.match(line):
        continue
    else: 
        w.write(line)

这对我来说足够好了：）

I couldn't figure out how to do this using xml.dom.minidom, so I just wrote a quick function to read in the output file and remove all blank lines and then rewrite to a new file:

f = open(xmlfile).readlines()
w = open('src_model.xml','w')
empty=re.compile('^
This works good enough for me :)
)
for line in open(xmlfile).readlines():
    if empty.match(line):
        continue
    else: 
        w.write(line)

This works good enough for me :)

回复收藏 0 原文

暮年慕年 2024-08-27 09:57:38

...用于搜索 ppl：

这个有趣的片段

skey = lambda x: getattr(x, "tagName", None)
mainnode.childNodes = sorted( 
  [n for n in mainnode.childNodes if n.nodeType != n.TEXT_NODE],
  cmp=lambda x, y: cmp(skey(y), skey(x)))

删除了所有文本节点（并且还按标记名对它们进行反向排序）。

即您可以（递归地）执行 tr.childNodes = [recurseclean(n) for n in tr.childNodes if n.nodeType != n.TEXT_NODE] 删除所有文本节点

或者您可能想要这样做类似于 ... if n.nodeType != n.TEXT_NODE or not re.match(r'^[:whitespace:]*$', n.data, re.MULTILINE) （没有尝试我自己的那个）如果您需要带有一些数据的文本节点。或者更复杂的东西，将文本保留在特定标签内。

之后，tree.toprettyxml(…) 将返回格式良好的 XML 文本。

… for searching ppl:

This funny snippet

skey = lambda x: getattr(x, "tagName", None)
mainnode.childNodes = sorted( 
  [n for n in mainnode.childNodes if n.nodeType != n.TEXT_NODE],
  cmp=lambda x, y: cmp(skey(y), skey(x)))

removes all text nodes (and, also, reverse sorts them by tagname).

I.e. you can (recursively) do tr.childNodes = [recurseclean(n) for n in tr.childNodes if n.nodeType != n.TEXT_NODE] to remove all text nodes

Or you might want to do something like … if n.nodeType != n.TEXT_NODE or not re.match(r'^[:whitespace:]*$', n.data, re.MULTILINE) (did't try that one myself) if you need text nodes with some data. Or something more complex to leave text inside specific tags.

After that tree.toprettyxml(…) will return well-formatted XML text.

回复收藏 0 原文

南城追梦 2024-08-27 09:57:38

我知道，这个问题已经很过时了，但是由于花了一段时间才找出解决问题的不同方法，这里是我的解决方案：

我发现最好的方法是使用 lxml，确实：

from lxml import etree

root = etree.fromstring(data)
# for tag in root.iter('tag') doesn't cope with namespaces...
for tag in root.xpath('//*[local-name() = "tag"]'):
    tag.getparent().remove(tag)
data = etree.tostring(root, encoding = 'utf-8', pretty_print = True)

使用 minidom，它有点复杂因为事实上，每个节点都伴随着一个尾随空白节点：

import xml.dom.minidom

dom = xml.dom.minidom.parseString(data)
for tag in dom.getElementsByTagName('tag'):
    if tag.nextSibling \
            and tag.nextSibling.nodeType == meta.TEXT_NODE \
            and tag.nextSibling.data.isspace():
        tag.parentNode.removeChild(tag.nextSibling)
    tag.parentNode.removeChild(tag)
data = dom.documentElement.toxml(encoding = 'utf-8')

I know, that this question is quite dated, but since it took a while to figure out different approaches to the problem, here are my solutions:

The best way, I found is using lxml, indeed:

from lxml import etree

root = etree.fromstring(data)
# for tag in root.iter('tag') doesn't cope with namespaces...
for tag in root.xpath('//*[local-name() = "tag"]'):
    tag.getparent().remove(tag)
data = etree.tostring(root, encoding = 'utf-8', pretty_print = True)

With minidom, it's a bit more convoluted due to the fact, that every node is accompanied with a trailing whitespace node:

import xml.dom.minidom

dom = xml.dom.minidom.parseString(data)
for tag in dom.getElementsByTagName('tag'):
    if tag.nextSibling \
            and tag.nextSibling.nodeType == meta.TEXT_NODE \
            and tag.nextSibling.data.isspace():
        tag.parentNode.removeChild(tag.nextSibling)
    tag.parentNode.removeChild(tag)
data = dom.documentElement.toxml(encoding = 'utf-8')

回复收藏 0 原文

~没有更多了~