XML:删除标签,但保留文本
我有一个相当大的XML文件,看起来像这样:
<corpus>
<dialogue speaker="A">
<sentence tag1="a" tag2="b"> Hello </sentence>
</dialogue>
<dialogue speaker="B">
<sentence tag1="cc" tag2= "dd"> How are you </sentence>
<sentence tag1="ff" tag2= "e"> today </sentence>
</dialogue>
<dialogue speaker="A">
<sentence tag1="d" tag2= "bbb"> Great </sentence>
<sentence tag1="f" tag2= "dd"> How about you </sentence>
</dialogue>
<dialogue speaker="B">
<sentence tag1="a" tag2= "dd"> me too </sentence>
</dialogue>
</corpus>
我需要删除子元素标签,因此零散的文本再次变为整体,在父母下方,对于看起来像这样的输出:
<corpus>
<dialogue speaker="A">
Hello
</dialogue>
<dialogue speaker="B">
How are you today
</dialogue>
<dialogue speaker="A">
Great How about you
</dialogue>
<dialogue speaker="B">
me too
</dialogue>
</corpus>
我尝试了元素。 strip()
和element.tag.strip()
,但没有输出...这是我的代码:
f = ET.parse("file.xml")
root = f.getroot()
for s in root.findall("sentence"):
text = s.tag.strip("sentence")
print(text)
我在做什么错? 谢谢大家的帮助!
I have a pretty big XML file that looks like this:
<corpus>
<dialogue speaker="A">
<sentence tag1="a" tag2="b"> Hello </sentence>
</dialogue>
<dialogue speaker="B">
<sentence tag1="cc" tag2= "dd"> How are you </sentence>
<sentence tag1="ff" tag2= "e"> today </sentence>
</dialogue>
<dialogue speaker="A">
<sentence tag1="d" tag2= "bbb"> Great </sentence>
<sentence tag1="f" tag2= "dd"> How about you </sentence>
</dialogue>
<dialogue speaker="B">
<sentence tag1="a" tag2= "dd"> me too </sentence>
</dialogue>
</corpus>
and I need to remove the subelement tags, so the fragmented text becomes whole again and under the parent, for an output that looks like this:
<corpus>
<dialogue speaker="A">
Hello
</dialogue>
<dialogue speaker="B">
How are you today
</dialogue>
<dialogue speaker="A">
Great How about you
</dialogue>
<dialogue speaker="B">
me too
</dialogue>
</corpus>
I've tried element.strip()
and element.tag.strip()
but there is no output... this is my code:
f = ET.parse("file.xml")
root = f.getroot()
for s in root.findall("sentence"):
text = s.tag.strip("sentence")
print(text)
What am I doing wrong?
Thank you all for your help!!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
你快到了。要获得输出,请尝试:
这应该输出所需的内容。
You're almost there. To get your output, try:
And that should output what you need.