使用 Python minidom 读取 XML 并迭代每个节点

发布于 2024-08-04 10:30:52 字数 1044 浏览 3 评论 0原文

我有一个如下所示的 XML 结构,但规模更大:

<root>
    <conference name='1'>
        <author>
            Bob
        </author>
        <author>
            Nigel
        </author>
    </conference>
    <conference name='2'>
        <author>
            Alice
        </author>
        <author>
            Mary
        </author>
    </conference>
</root>

为此,我使用了以下代码:

dom = parse(filepath)
conference=dom.getElementsByTagName('conference')
for node in conference:
    conf_name=node.getAttribute('name')
    print conf_name
    alist=node.getElementsByTagName('author')
    for a in alist:
        authortext= a.nodeValue
        print authortext

但是,打印出来的作者文本是“None”。我尝试使用如下所示的变体,但这会导致我的程序崩溃。

authortext=a[0].nodeValue

正确的输出应该是:

1
Bob
Nigel
2
Alice
Mary

但我得到的是:

1
None
None
2
None
None

关于如何解决这个问题的任何建议?

I have an XML structure that looks like the following, but on a much larger scale:

<root>
    <conference name='1'>
        <author>
            Bob
        </author>
        <author>
            Nigel
        </author>
    </conference>
    <conference name='2'>
        <author>
            Alice
        </author>
        <author>
            Mary
        </author>
    </conference>
</root>

For this, I used the following code:

dom = parse(filepath)
conference=dom.getElementsByTagName('conference')
for node in conference:
    conf_name=node.getAttribute('name')
    print conf_name
    alist=node.getElementsByTagName('author')
    for a in alist:
        authortext= a.nodeValue
        print authortext

However, the authortext that is printed out is 'None.' I tried messing around with using variations like what is below, but it causes my program to break.

authortext=a[0].nodeValue

The correct output should be:

1
Bob
Nigel
2
Alice
Mary

But what I get is:

1
None
None
2
None
None

Any suggestions on how to tackle this problem?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

顾挽 2024-08-11 10:30:52

您的 authortext 属于类型 1 (ELEMENT_NODE),通常您需要有 TEXT_NODE 才能获取字符串。这会起作用

a.childNodes[0].nodeValue

your authortext is of type 1 (ELEMENT_NODE), normally you need to have TEXT_NODE to get a string. This will work

a.childNodes[0].nodeValue
策马西风 2024-08-11 10:30:52

元素节点没有nodeValue。您必须查看其中的文本节点。如果您知道内部始终有一个文本节点,您可以说 element.firstChild.data (数据与文本节点的 nodeValue 相同)。

注意:如果没有文本内容,则不会有子 Text 节点,element.firstChild 将为 null,导致 .data 访问失败。

获取直接子文本节点内容的快速方法:

text= ''.join(child.data for child in element.childNodes if child.nodeType==child.TEXT_NODE)

在 DOM Level 3 Core 中,您可以获得 textContent 属性,您可以使用该属性以递归方式从 Element 内部获取文本,但 minidom 不支持此功能(某些其他 Python DOM 实现也是如此)。

Element nodes don't have a nodeValue. You have to look at the Text nodes inside them. If you know there's always one text node inside you can say element.firstChild.data (data is the same as nodeValue for text nodes).

Be careful: if there is no text content there will be no child Text nodes and element.firstChild will be null, causing the .data access to fail.

Quick way to get the content of direct child text nodes:

text= ''.join(child.data for child in element.childNodes if child.nodeType==child.TEXT_NODE)

In DOM Level 3 Core you get the textContent property you can use to get text from inside an Element recursively, but minidom doesn't support this (some other Python DOM implementations do).

心房的律动 2024-08-11 10:30:52

快速访问:

node.getElementsByTagName('author')[0].childNodes[0].nodeValue

Quick access:

node.getElementsByTagName('author')[0].childNodes[0].nodeValue
可爱暴击 2024-08-11 10:30:52

由于每个作者始终有一个文本数据值,因此可以使用 element.firstChild.data

dom = parseString(document)
conferences = dom.getElementsByTagName("conference")

# Each conference here is a node
for conference in conferences:
    conference_name = conference.getAttribute("name")
    print 
    print conference_name.upper() + " - "

    authors = conference.getElementsByTagName("author")
    for author in authors:
        print "  ", author.firstChild.data
    # for

    print

Since you always have one text data value per author you can use element.firstChild.data

dom = parseString(document)
conferences = dom.getElementsByTagName("conference")

# Each conference here is a node
for conference in conferences:
    conference_name = conference.getAttribute("name")
    print 
    print conference_name.upper() + " - "

    authors = conference.getElementsByTagName("author")
    for author in authors:
        print "  ", author.firstChild.data
    # for

    print
南巷近海 2024-08-11 10:30:52

我玩了一下,这就是我要做的工作:

# ...
authortext= a.childNodes[0].nodeValue
print authortext

导致输出:

C:\temp\py>xml2.py
1
Bob
Nigel
2
Alice
Mary

我无法确切地告诉你为什么必须访问 childNode 才能获取内部文本,但至少这就是你想要的。

I played around with it a bit, and here's what I got to work:

# ...
authortext= a.childNodes[0].nodeValue
print authortext

leading to output of:

C:\temp\py>xml2.py
1
Bob
Nigel
2
Alice
Mary

I can't tell you exactly why you have to access the childNode to get the inner text, but at least that's what you were looking for.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文