使用 minidom 从子节点检索值

发布于 2024-11-28 21:09:09 字数 1490 浏览 3 评论 0原文

中检索值

from xml.dom import minidom

def Get_ExtList(progName):
    progFile='%s.xml'%progName
    xmldoc = minidom.parse(progFile)
    extList=[]
    rootNode=xmldoc.firstChild
    progNode=rootNode.childNodes[1]
    for fileNodes in progNode.childNodes:
        newList=[]      
        for formatNodes in fileNodes.childNodes:        
            for nodes in formatNodes.childNodes:
                x=nodes.toxml()
                x=' '.join(x.split())
                newList.append(str(x))
        extList.append(newList)     
    print extList

我对 XML 非常陌生,我试图从子节点输出

[[], [‘.aaa'], [], [‘.bbb'], [], [‘.ccc'], [], [‘.ddd'], [], [‘.xxx', ‘.yyy'], []]

:但我想要如下所示的内容

[[‘.aaa'], [‘.bbb'],[‘.ccc’],[‘.ddd'],[‘.xxx', ‘.yyy']]

这是一个示例文件:

<?xml version="1.0" ?>
<program>
  <progname name="TEST">
    <file>
      <format>
        .aaa
      </format>
    </file>
    <file>
      <format>
        .bbb
      </format>
    </file>
    <file>
      <format>
        .ccc
      </format>
    </file>
    <file>
      <format>
        .ddd
      </format>
    </file>
    <file>
      <format>
        .xxx
      </format>
      <format>
        .yyy
      </format>
    </file>
  </progname>
</program>

I am very new to XML and I trying to retrieve the value from childnodes

from xml.dom import minidom

def Get_ExtList(progName):
    progFile='%s.xml'%progName
    xmldoc = minidom.parse(progFile)
    extList=[]
    rootNode=xmldoc.firstChild
    progNode=rootNode.childNodes[1]
    for fileNodes in progNode.childNodes:
        newList=[]      
        for formatNodes in fileNodes.childNodes:        
            for nodes in formatNodes.childNodes:
                x=nodes.toxml()
                x=' '.join(x.split())
                newList.append(str(x))
        extList.append(newList)     
    print extList

Output:

[[], [‘.aaa'], [], [‘.bbb'], [], [‘.ccc'], [], [‘.ddd'], [], [‘.xxx', ‘.yyy'], []]

but I want something as follows

[[‘.aaa'], [‘.bbb'],[‘.ccc’],[‘.ddd'],[‘.xxx', ‘.yyy']]

Here is a sample file:

<?xml version="1.0" ?>
<program>
  <progname name="TEST">
    <file>
      <format>
        .aaa
      </format>
    </file>
    <file>
      <format>
        .bbb
      </format>
    </file>
    <file>
      <format>
        .ccc
      </format>
    </file>
    <file>
      <format>
        .ddd
      </format>
    </file>
    <file>
      <format>
        .xxx
      </format>
      <format>
        .yyy
      </format>
    </file>
  </progname>
</program>

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

静谧 2024-12-05 21:09:09

您不仅循环遍历包含 标记的节点(ELEMENT_NODE 节点类型),还循环遍历缩进空格(TEXT_NODE 节点类型) )。例如,在此元素中:

<a>
  <b>c</b>
</a>

共有三个元素:

  • TEXT_NODE,值为 \n__(空格用 _ 表示)
  • ELEMENT_NODE 值为 c
  • TEXT_NODE 值为 \n

如果该文件的格式不同: c 里面只有一个 ELEMENT_NODE

例如,您可以跳过这些节点:

for fileNodes in progNode.childNodes:    
    if fileNodes.nodeType != fileNodes.ELEMENT_NODE:    
        continue

或者检查是否为正确的节点创建了 newList 并仅为 ELEMENT_NODE 添加其内容:

    if fileNodes.nodeType == fileNodes.ELEMENT_NODE:    
        extList.append(newList)         

否则您将得到空列表 []附加了。

You are looping not only through nodes that contain <file> tags (ELEMENT_NODE node type), but also indentation white space (TEXT_NODE node type). For example in this element:

<a>
  <b>c</b>
</a>

There are three elements:

  • TEXT_NODE with value \n__ (spaces indicated with _)
  • ELEMENT_NODE with value <b>c</b>
  • TEXT_NODE with value \n

If that file was formatted differently: <a><b>c</b></a> there would be only one ELEMENT_NODE inside.

You could for example skip these nodes:

for fileNodes in progNode.childNodes:    
    if fileNodes.nodeType != fileNodes.ELEMENT_NODE:    
        continue

or check wether newList was created for correct node and add it's contents only for ELEMENT_NODE:

    if fileNodes.nodeType == fileNodes.ELEMENT_NODE:    
        extList.append(newList)         

otherwise you would get empty list [] appended.

樱花细雨 2024-12-05 21:09:09

在这种情况下,您可以尝试处理列表并删除空元素:

>>> list = [[], ['.inp'], [], ['.mdp'], [], ['.xtc'], [], ['.top'], [], ['.gro', '.pdb'], []]
>>> for i in list:
...   if not i:
...     list.remove(i)
... 
>>> list
[['.inp'], ['.mdp'], ['.xtc'], ['.top'], ['.gro', '.pdb']]

In this case you could try to process the list and delete empty elements:

>>> list = [[], ['.inp'], [], ['.mdp'], [], ['.xtc'], [], ['.top'], [], ['.gro', '.pdb'], []]
>>> for i in list:
...   if not i:
...     list.remove(i)
... 
>>> list
[['.inp'], ['.mdp'], ['.xtc'], ['.top'], ['.gro', '.pdb']]
落墨 2024-12-05 21:09:09

DOM 节点可以是元素、文本,甚至注释。另请注意,不应使用 toxml 来提取文本内容。相反,使用文本节点的 .data 属性:

for nodes in formatNodes.childNodes:
    if node.nodeType == node.ELEMENT_NODE:
        tns =(tn.data for tn in node.childNodes if tn.nodeType == node.TEXT_NODE)
        newList.append(''.join(tns).strip())

DOM nodes can be elements, text, or even comments. Also note that toxml should not be used to extract text content. Instead, use the .data property of text nodes:

for nodes in formatNodes.childNodes:
    if node.nodeType == node.ELEMENT_NODE:
        tns =(tn.data for tn in node.childNodes if tn.nodeType == node.TEXT_NODE)
        newList.append(''.join(tns).strip())
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文