使用 minidom 从子节点检索值
中检索值
from xml.dom import minidom
def Get_ExtList(progName):
progFile='%s.xml'%progName
xmldoc = minidom.parse(progFile)
extList=[]
rootNode=xmldoc.firstChild
progNode=rootNode.childNodes[1]
for fileNodes in progNode.childNodes:
newList=[]
for formatNodes in fileNodes.childNodes:
for nodes in formatNodes.childNodes:
x=nodes.toxml()
x=' '.join(x.split())
newList.append(str(x))
extList.append(newList)
print extList
我对 XML 非常陌生,我试图从子节点输出
[[], [‘.aaa'], [], [‘.bbb'], [], [‘.ccc'], [], [‘.ddd'], [], [‘.xxx', ‘.yyy'], []]
:但我想要如下所示的内容
[[‘.aaa'], [‘.bbb'],[‘.ccc’],[‘.ddd'],[‘.xxx', ‘.yyy']]
这是一个示例文件:
<?xml version="1.0" ?>
<program>
<progname name="TEST">
<file>
<format>
.aaa
</format>
</file>
<file>
<format>
.bbb
</format>
</file>
<file>
<format>
.ccc
</format>
</file>
<file>
<format>
.ddd
</format>
</file>
<file>
<format>
.xxx
</format>
<format>
.yyy
</format>
</file>
</progname>
</program>
I am very new to XML and I trying to retrieve the value from childnodes
from xml.dom import minidom
def Get_ExtList(progName):
progFile='%s.xml'%progName
xmldoc = minidom.parse(progFile)
extList=[]
rootNode=xmldoc.firstChild
progNode=rootNode.childNodes[1]
for fileNodes in progNode.childNodes:
newList=[]
for formatNodes in fileNodes.childNodes:
for nodes in formatNodes.childNodes:
x=nodes.toxml()
x=' '.join(x.split())
newList.append(str(x))
extList.append(newList)
print extList
Output:
[[], [‘.aaa'], [], [‘.bbb'], [], [‘.ccc'], [], [‘.ddd'], [], [‘.xxx', ‘.yyy'], []]
but I want something as follows
[[‘.aaa'], [‘.bbb'],[‘.ccc’],[‘.ddd'],[‘.xxx', ‘.yyy']]
Here is a sample file:
<?xml version="1.0" ?>
<program>
<progname name="TEST">
<file>
<format>
.aaa
</format>
</file>
<file>
<format>
.bbb
</format>
</file>
<file>
<format>
.ccc
</format>
</file>
<file>
<format>
.ddd
</format>
</file>
<file>
<format>
.xxx
</format>
<format>
.yyy
</format>
</file>
</progname>
</program>
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
您不仅循环遍历包含
标记的节点(ELEMENT_NODE
节点类型),还循环遍历缩进空格(TEXT_NODE
节点类型) )。例如,在此元素中:共有三个元素:
TEXT_NODE
,值为\n__
(空格用_
表示)ELEMENT_NODE
值为c
TEXT_NODE
值为\n
如果该文件的格式不同:
c
里面只有一个ELEMENT_NODE
。例如,您可以跳过这些节点:
或者检查是否为正确的节点创建了
newList
并仅为ELEMENT_NODE
添加其内容:否则您将得到空列表
[]附加了。
You are looping not only through nodes that contain
<file>
tags (ELEMENT_NODE
node type), but also indentation white space (TEXT_NODE
node type). For example in this element:There are three elements:
TEXT_NODE
with value\n__
(spaces indicated with_
)ELEMENT_NODE
with value<b>c</b>
TEXT_NODE
with value\n
If that file was formatted differently:
<a><b>c</b></a>
there would be only oneELEMENT_NODE
inside.You could for example skip these nodes:
or check wether
newList
was created for correct node and add it's contents only forELEMENT_NODE
:otherwise you would get empty list
[]
appended.在这种情况下,您可以尝试处理列表并删除空元素:
In this case you could try to process the list and delete empty elements:
DOM 节点可以是元素、文本,甚至注释。另请注意,不应使用
toxml
来提取文本内容。相反,使用文本节点的.data
属性:DOM nodes can be elements, text, or even comments. Also note that
toxml
should not be used to extract text content. Instead, use the.data
property of text nodes: