使用 minidom 从子节点检索值

发布于 2024-11-28 21:09:09 字数 1490 浏览 3 评论 0原文

中检索值

from xml.dom import minidom

def Get_ExtList(progName):
    progFile='%s.xml'%progName
    xmldoc = minidom.parse(progFile)
    extList=[]
    rootNode=xmldoc.firstChild
    progNode=rootNode.childNodes[1]
    for fileNodes in progNode.childNodes:
        newList=[]      
        for formatNodes in fileNodes.childNodes:        
            for nodes in formatNodes.childNodes:
                x=nodes.toxml()
                x=' '.join(x.split())
                newList.append(str(x))
        extList.append(newList)     
    print extList

我对 XML 非常陌生，我试图从子节点输出

[[], [‘.aaa'], [], [‘.bbb'], [], [‘.ccc'], [], [‘.ddd'], [], [‘.xxx', ‘.yyy'], []]

：但我想要如下所示的内容

[[‘.aaa'], [‘.bbb'],[‘.ccc’],[‘.ddd'],[‘.xxx', ‘.yyy']]

这是一个示例文件：

<?xml version="1.0" ?>
<program>
  <progname name="TEST">
    <file>
      <format>
        .aaa
      </format>
    </file>
    <file>
      <format>
        .bbb
      </format>
    </file>
    <file>
      <format>
        .ccc
      </format>
    </file>
    <file>
      <format>
        .ddd
      </format>
    </file>
    <file>
      <format>
        .xxx
      </format>
      <format>
        .yyy
      </format>
    </file>
  </progname>
</program>

原文

I am very new to XML and I trying to retrieve the value from childnodes

from xml.dom import minidom

def Get_ExtList(progName):
    progFile='%s.xml'%progName
    xmldoc = minidom.parse(progFile)
    extList=[]
    rootNode=xmldoc.firstChild
    progNode=rootNode.childNodes[1]
    for fileNodes in progNode.childNodes:
        newList=[]      
        for formatNodes in fileNodes.childNodes:        
            for nodes in formatNodes.childNodes:
                x=nodes.toxml()
                x=' '.join(x.split())
                newList.append(str(x))
        extList.append(newList)     
    print extList

Output:

[[], [‘.aaa'], [], [‘.bbb'], [], [‘.ccc'], [], [‘.ddd'], [], [‘.xxx', ‘.yyy'], []]

but I want something as follows

[[‘.aaa'], [‘.bbb'],[‘.ccc’],[‘.ddd'],[‘.xxx', ‘.yyy']]

Here is a sample file:

<?xml version="1.0" ?>
<program>
  <progname name="TEST">
    <file>
      <format>
        .aaa
      </format>
    </file>
    <file>
      <format>
        .bbb
      </format>
    </file>
    <file>
      <format>
        .ccc
      </format>
    </file>
    <file>
      <format>
        .ddd
      </format>
    </file>
    <file>
      <format>
        .xxx
      </format>
      <format>
        .yyy
      </format>
    </file>
  </progname>
</program>

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

静谧 2024-12-05 21:09:09

您不仅循环遍历包含标记的节点（ELEMENT_NODE 节点类型），还循环遍历缩进空格（TEXT_NODE 节点类型））。例如，在此元素中：

<a>
  <b>c</b>
</a>

共有三个元素：

TEXT_NODE，值为 \n__（空格用 _ 表示）
ELEMENT_NODE 值为 c
TEXT_NODE 值为 \n

如果该文件的格式不同： c 里面只有一个 ELEMENT_NODE。

例如，您可以跳过这些节点：

for fileNodes in progNode.childNodes:    
    if fileNodes.nodeType != fileNodes.ELEMENT_NODE:    
        continue

或者检查是否为正确的节点创建了 newList 并仅为 ELEMENT_NODE 添加其内容：

    if fileNodes.nodeType == fileNodes.ELEMENT_NODE:    
        extList.append(newList)

否则您将得到空列表 []附加了。

You are looping not only through nodes that contain <file> tags (ELEMENT_NODE node type), but also indentation white space (TEXT_NODE node type). For example in this element:

<a>
  <b>c</b>
</a>

There are three elements:

TEXT_NODE with value \n__ (spaces indicated with _)
ELEMENT_NODE with value <b>c</b>
TEXT_NODE with value \n

If that file was formatted differently: <a><b>c</b></a> there would be only one ELEMENT_NODE inside.

You could for example skip these nodes:

for fileNodes in progNode.childNodes:    
    if fileNodes.nodeType != fileNodes.ELEMENT_NODE:    
        continue

or check wether newList was created for correct node and add it's contents only for ELEMENT_NODE:

    if fileNodes.nodeType == fileNodes.ELEMENT_NODE:    
        extList.append(newList)

otherwise you would get empty list [] appended.

回复收藏 0 原文

樱花细雨 2024-12-05 21:09:09

在这种情况下，您可以尝试处理列表并删除空元素：

>>> list = [[], ['.inp'], [], ['.mdp'], [], ['.xtc'], [], ['.top'], [], ['.gro', '.pdb'], []]
>>> for i in list:
...   if not i:
...     list.remove(i)
... 
>>> list
[['.inp'], ['.mdp'], ['.xtc'], ['.top'], ['.gro', '.pdb']]

In this case you could try to process the list and delete empty elements:

>>> list = [[], ['.inp'], [], ['.mdp'], [], ['.xtc'], [], ['.top'], [], ['.gro', '.pdb'], []]
>>> for i in list:
...   if not i:
...     list.remove(i)
... 
>>> list
[['.inp'], ['.mdp'], ['.xtc'], ['.top'], ['.gro', '.pdb']]

回复收藏 0 原文

落墨 2024-12-05 21:09:09

DOM 节点可以是元素、文本，甚至注释。另请注意，不应使用 toxml 来提取文本内容。相反，使用文本节点的 .data 属性：

for nodes in formatNodes.childNodes:
    if node.nodeType == node.ELEMENT_NODE:
        tns =(tn.data for tn in node.childNodes if tn.nodeType == node.TEXT_NODE)
        newList.append(''.join(tns).strip())

DOM nodes can be elements, text, or even comments. Also note that toxml should not be used to extract text content. Instead, use the .data property of text nodes:

for nodes in formatNodes.childNodes:
    if node.nodeType == node.ELEMENT_NODE:
        tns =(tn.data for tn in node.childNodes if tn.nodeType == node.TEXT_NODE)
        newList.append(''.join(tns).strip())

回复收藏 0 原文

~没有更多了~