使用 Python minidom 读取 XML 并迭代每个节点
我有一个如下所示的 XML 结构,但规模更大:
<root>
<conference name='1'>
<author>
Bob
</author>
<author>
Nigel
</author>
</conference>
<conference name='2'>
<author>
Alice
</author>
<author>
Mary
</author>
</conference>
</root>
为此,我使用了以下代码:
dom = parse(filepath)
conference=dom.getElementsByTagName('conference')
for node in conference:
conf_name=node.getAttribute('name')
print conf_name
alist=node.getElementsByTagName('author')
for a in alist:
authortext= a.nodeValue
print authortext
但是,打印出来的作者文本是“None”。我尝试使用如下所示的变体,但这会导致我的程序崩溃。
authortext=a[0].nodeValue
正确的输出应该是:
1
Bob
Nigel
2
Alice
Mary
但我得到的是:
1
None
None
2
None
None
关于如何解决这个问题的任何建议?
I have an XML structure that looks like the following, but on a much larger scale:
<root>
<conference name='1'>
<author>
Bob
</author>
<author>
Nigel
</author>
</conference>
<conference name='2'>
<author>
Alice
</author>
<author>
Mary
</author>
</conference>
</root>
For this, I used the following code:
dom = parse(filepath)
conference=dom.getElementsByTagName('conference')
for node in conference:
conf_name=node.getAttribute('name')
print conf_name
alist=node.getElementsByTagName('author')
for a in alist:
authortext= a.nodeValue
print authortext
However, the authortext that is printed out is 'None.' I tried messing around with using variations like what is below, but it causes my program to break.
authortext=a[0].nodeValue
The correct output should be:
1
Bob
Nigel
2
Alice
Mary
But what I get is:
1
None
None
2
None
None
Any suggestions on how to tackle this problem?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
您的
authortext
属于类型 1 (ELEMENT_NODE
),通常您需要有TEXT_NODE
才能获取字符串。这会起作用your
authortext
is of type 1 (ELEMENT_NODE
), normally you need to haveTEXT_NODE
to get a string. This will work元素节点没有nodeValue。您必须查看其中的文本节点。如果您知道内部始终有一个文本节点,您可以说
element.firstChild.data
(数据与文本节点的 nodeValue 相同)。注意:如果没有文本内容,则不会有子 Text 节点,
element.firstChild
将为 null,导致.data
访问失败。获取直接子文本节点内容的快速方法:
在 DOM Level 3 Core 中,您可以获得
textContent
属性,您可以使用该属性以递归方式从 Element 内部获取文本,但 minidom 不支持此功能(某些其他 Python DOM 实现也是如此)。Element nodes don't have a nodeValue. You have to look at the Text nodes inside them. If you know there's always one text node inside you can say
element.firstChild.data
(data is the same as nodeValue for text nodes).Be careful: if there is no text content there will be no child Text nodes and
element.firstChild
will be null, causing the.data
access to fail.Quick way to get the content of direct child text nodes:
In DOM Level 3 Core you get the
textContent
property you can use to get text from inside an Element recursively, but minidom doesn't support this (some other Python DOM implementations do).快速访问:
Quick access:
由于每个作者始终有一个文本数据值,因此可以使用 element.firstChild.data
Since you always have one text data value per author you can use element.firstChild.data
我玩了一下,这就是我要做的工作:
导致输出:
我无法确切地告诉你为什么必须访问 childNode 才能获取内部文本,但至少这就是你想要的。
I played around with it a bit, and here's what I got to work:
leading to output of:
I can't tell you exactly why you have to access the childNode to get the inner text, but at least that's what you were looking for.