如何在 python 中使用 ElementTree 获取元素的 xml:id

发布于 2025-01-16 00:21:06 字数 1265 浏览 0 评论 0原文

很抱歉，如果这是一个非常基本的问题，但我已经在这个问题面前坐了好几个小时了，只是无法让它发挥作用。

我正在与英国国家语料库（其文件采用 XML 格式）合作，我想提取这些文件中不同人的属性。我正在处理的部分的结构如下：

<bncDoc>
<teiHeader>
<profileDesc>
<particDesc n="C196">
                <person ageGroup="X" xml:id="PS21Y" role="unspecified" sex="f" soc="UU" dialect="NONE" firstLang="EN-GBR" educ="X">
                    <persName>j. hammond</persName>
                    <occupation>interviewer</occupation>
                </person>
                <person ageGroup="X" xml:id="PS220" role="unspecified" sex="m" soc="UU" dialect="XIS" firstLang="EN-GBR" educ="X">
                    <persName>Bhagan</persName>
                </person>
</particDesc>
</profileDesc>
</teiHeader>
</bncDoc>

我试图提取“person”元素的“id”、“sex”、“soc”和“ageGroup”。但我只是不知道它如何与那些“xml:id”一起工作。我尝试的方法（如下所示）不起作用。它适用于“sex”、“soc”和“ageGroup”，但不适用于“xml:id”。有谁知道，如何让它发挥作用？这对我有很大帮助！ :)

for i in root.findall('teiHeader/profileDesc/particDesc/person'):
            tmp = []
            tmp.append(i.get('id'))
            tmp.append(i.get('sex'))
            tmp.append(i.get('soc'))
            tmp.append(i.get('ageGroup'))

原文

I'm sorry, if that is a really basic questions, but I'm sitting in front of that problem for hours already and just can't make it work.

I'm working with the British National Corpus (which files are in XML-format) and I want to extract the attributes of different persons in those files.
The part I'm working with is structured like this:

<bncDoc>
<teiHeader>
<profileDesc>
<particDesc n="C196">
                <person ageGroup="X" xml:id="PS21Y" role="unspecified" sex="f" soc="UU" dialect="NONE" firstLang="EN-GBR" educ="X">
                    <persName>j. hammond</persName>
                    <occupation>interviewer</occupation>
                </person>
                <person ageGroup="X" xml:id="PS220" role="unspecified" sex="m" soc="UU" dialect="XIS" firstLang="EN-GBR" educ="X">
                    <persName>Bhagan</persName>
                </person>
</particDesc>
</profileDesc>
</teiHeader>
</bncDoc>

I'm trying to extract "id", "sex", "soc", and "ageGroup" of the "person" elements. But I just don't know how it works with those "xml:id"'s. The way I'm trying to do it (like shown below), doesn't work. It works for "sex", "soc", and "ageGroup", but not for "xml:id".
Does anyone know, how to make it work? That would help me a lot! :)

for i in root.findall('teiHeader/profileDesc/particDesc/person'):
            tmp = []
            tmp.append(i.get('id'))
            tmp.append(i.get('sex'))
            tmp.append(i.get('soc'))
            tmp.append(i.get('ageGroup'))

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

甜心 2025-01-23 00:21:06

如果您使用它，它就可以工作。

i.get('{http://www.w3.org/XML/1998/namespace}id')

这看起来有点难看，但它与 xml: 是绑定到 http://www.w3 的特殊命名空间前缀有关。 org/XML/1998/namespace URI。请参阅https://www.w3.org/XML/1998/namespace。

It works if you use

i.get('{http://www.w3.org/XML/1998/namespace}id')

This looks a bit ugly, but it has to do with the fact that xml: is a special namespace prefix that is bound to the http://www.w3.org/XML/1998/namespace URI. See https://www.w3.org/XML/1998/namespace.

回复收藏 0 原文

~没有更多了~