如何在 python 中使用 ElementTree 获取元素的 xml:id
很抱歉,如果这是一个非常基本的问题,但我已经在这个问题面前坐了好几个小时了,只是无法让它发挥作用。
我正在与英国国家语料库(其文件采用 XML 格式)合作,我想提取这些文件中不同人的属性。 我正在处理的部分的结构如下:
<bncDoc>
<teiHeader>
<profileDesc>
<particDesc n="C196">
<person ageGroup="X" xml:id="PS21Y" role="unspecified" sex="f" soc="UU" dialect="NONE" firstLang="EN-GBR" educ="X">
<persName>j. hammond</persName>
<occupation>interviewer</occupation>
</person>
<person ageGroup="X" xml:id="PS220" role="unspecified" sex="m" soc="UU" dialect="XIS" firstLang="EN-GBR" educ="X">
<persName>Bhagan</persName>
</person>
</particDesc>
</profileDesc>
</teiHeader>
</bncDoc>
我试图提取“person”元素的“id”、“sex”、“soc”和“ageGroup”。但我只是不知道它如何与那些“xml:id”一起工作。我尝试的方法(如下所示)不起作用。它适用于“sex”、“soc”和“ageGroup”,但不适用于“xml:id”。 有谁知道,如何让它发挥作用?这对我有很大帮助! :)
for i in root.findall('teiHeader/profileDesc/particDesc/person'):
tmp = []
tmp.append(i.get('id'))
tmp.append(i.get('sex'))
tmp.append(i.get('soc'))
tmp.append(i.get('ageGroup'))
I'm sorry, if that is a really basic questions, but I'm sitting in front of that problem for hours already and just can't make it work.
I'm working with the British National Corpus (which files are in XML-format) and I want to extract the attributes of different persons in those files.
The part I'm working with is structured like this:
<bncDoc>
<teiHeader>
<profileDesc>
<particDesc n="C196">
<person ageGroup="X" xml:id="PS21Y" role="unspecified" sex="f" soc="UU" dialect="NONE" firstLang="EN-GBR" educ="X">
<persName>j. hammond</persName>
<occupation>interviewer</occupation>
</person>
<person ageGroup="X" xml:id="PS220" role="unspecified" sex="m" soc="UU" dialect="XIS" firstLang="EN-GBR" educ="X">
<persName>Bhagan</persName>
</person>
</particDesc>
</profileDesc>
</teiHeader>
</bncDoc>
I'm trying to extract "id", "sex", "soc", and "ageGroup" of the "person" elements. But I just don't know how it works with those "xml:id"'s. The way I'm trying to do it (like shown below), doesn't work. It works for "sex", "soc", and "ageGroup", but not for "xml:id".
Does anyone know, how to make it work? That would help me a lot! :)
for i in root.findall('teiHeader/profileDesc/particDesc/person'):
tmp = []
tmp.append(i.get('id'))
tmp.append(i.get('sex'))
tmp.append(i.get('soc'))
tmp.append(i.get('ageGroup'))
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
如果您使用它,它就可以工作。
这看起来有点难看,但它与
xml:
是绑定到http://www.w3 的特殊命名空间前缀有关。 org/XML/1998/namespace
URI。请参阅https://www.w3.org/XML/1998/namespace。It works if you use
This looks a bit ugly, but it has to do with the fact that
xml:
is a special namespace prefix that is bound to thehttp://www.w3.org/XML/1998/namespace
URI. See https://www.w3.org/XML/1998/namespace.