XML:删除标签,但保留文本

发布于 2025-01-30 04:22:19 字数 1338 浏览 2 评论 0原文

我有一个相当大的XML文件,看起来像这样:

<corpus>
  <dialogue speaker="A">
    <sentence tag1="a" tag2="b"> Hello </sentence>
  </dialogue>
  <dialogue speaker="B">
    <sentence tag1="cc" tag2= "dd"> How are you </sentence>
    <sentence tag1="ff" tag2= "e"> today </sentence>
  </dialogue>
  <dialogue speaker="A">
    <sentence tag1="d" tag2= "bbb"> Great </sentence>
    <sentence tag1="f" tag2= "dd"> How about you </sentence>
  </dialogue>
  <dialogue speaker="B">
    <sentence tag1="a" tag2= "dd"> me too </sentence>
  </dialogue>
</corpus>

我需要删除子元素标签,因此零散的文本再次变为整体,在父母下方,对于看起来像这样的输出:

<corpus>
  <dialogue speaker="A">
    Hello
  </dialogue>
  <dialogue speaker="B">
    How are you today
  </dialogue>
  <dialogue speaker="A">
    Great How about you
  </dialogue>
  <dialogue speaker="B">
     me too
  </dialogue>
</corpus>

我尝试了元素。 strip()element.tag.strip(),但没有输出...这是我的代码:

f = ET.parse("file.xml")
root = f.getroot()

for s in root.findall("sentence"):
    text = s.tag.strip("sentence")
    print(text)

我在做什么错? 谢谢大家的帮助!

I have a pretty big XML file that looks like this:

<corpus>
  <dialogue speaker="A">
    <sentence tag1="a" tag2="b"> Hello </sentence>
  </dialogue>
  <dialogue speaker="B">
    <sentence tag1="cc" tag2= "dd"> How are you </sentence>
    <sentence tag1="ff" tag2= "e"> today </sentence>
  </dialogue>
  <dialogue speaker="A">
    <sentence tag1="d" tag2= "bbb"> Great </sentence>
    <sentence tag1="f" tag2= "dd"> How about you </sentence>
  </dialogue>
  <dialogue speaker="B">
    <sentence tag1="a" tag2= "dd"> me too </sentence>
  </dialogue>
</corpus>

and I need to remove the subelement tags, so the fragmented text becomes whole again and under the parent, for an output that looks like this:

<corpus>
  <dialogue speaker="A">
    Hello
  </dialogue>
  <dialogue speaker="B">
    How are you today
  </dialogue>
  <dialogue speaker="A">
    Great How about you
  </dialogue>
  <dialogue speaker="B">
     me too
  </dialogue>
</corpus>

I've tried element.strip() and element.tag.strip() but there is no output... this is my code:

f = ET.parse("file.xml")
root = f.getroot()

for s in root.findall("sentence"):
    text = s.tag.strip("sentence")
    print(text)

What am I doing wrong?
Thank you all for your help!!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

旧伤还要旧人安 2025-02-06 04:22:19

你快到了。要获得输出,请尝试:

for d in root.findall(".//dialogue"):
        for s in d.findall('.//sentence'):
            if s.text:          
                new_t = s.text.strip()
            d.remove(s)
            d.text=new_t
print(ET.tostring(root).decode())

这应该输出所需的内容。

You're almost there. To get your output, try:

for d in root.findall(".//dialogue"):
        for s in d.findall('.//sentence'):
            if s.text:          
                new_t = s.text.strip()
            d.remove(s)
            d.text=new_t
print(ET.tostring(root).decode())

And that should output what you need.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文