当前位置：文江博客话题详情

Python XML String lxml elementtree

如何使用 python 的 lxml.etree 库从 xml 标签的所有嵌套标签中获取所有字符串？

发布于 2024-11-10 16:34:51 字数 771 浏览 2 评论 0原文

我有一个 xml 文件，其中可能会发生以下情况：

...
<a><b>This is</b> some text about <c>some</c> issue I have, parsing xml</a>
...

编辑：让我们假设，标签可以嵌套不止一个级别，这意味着

<a><b><c>...</c>...</b>...</a>

我使用 python lxml.etree 库想出了这个。

context = etree.iterparse(PATH_TO_XML, dtd_validation=True, events=("end",))
for event, element in context:
    tag = element.tag
    if tag == "a":
        print element.text # is empty :/
        mystring = element.xpath("string()")
        ...

但不知怎的，它出了问题。

我想要的是整个字符串

"This is some text about some issue I have, parsing xml"

，但我只得到一个空字符串。有什么建议吗？谢谢！

I have an xml file in which it is possible that the following occurs:

...
<a><b>This is</b> some text about <c>some</c> issue I have, parsing xml</a>
...

Edit: Let's assume, the tags could be nested more than only level, meaning

<a><b><c>...</c>...</b>...</a>

I came up with this using the python lxml.etree library.

context = etree.iterparse(PATH_TO_XML, dtd_validation=True, events=("end",))
for event, element in context:
    tag = element.tag
    if tag == "a":
        print element.text # is empty :/
        mystring = element.xpath("string()")
        ...

But somehow it goes wrong.

What I want is the whole string

"This is some text about some issue I have, parsing xml"

But I only get an empty string. Any suggestions? Thanks!

收藏 0

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

评论（1）

神妖 2024-11-17 16:34:51

这个问题已经被问过很多次了。

您可以使用lxml.html.text_content()方法。

import lxml.html
t = lxml.html.fromstring("...")
t.text_content()

REF: 过滤掉 HTML 标签并解析 python 中的实体

或者使用 lxml.etree.strip_tags() 方法。

REF：在lxml中，我该如何删除标签但保留所有内容？

This question has been asked many times.

You can use lxml.html.text_content() method.

import lxml.html
t = lxml.html.fromstring("...")
t.text_content()

REF: Filter out HTML tags and resolve entities in python

OR use lxml.etree.strip_tags() method.

REF: In lxml, how do I remove a tag but retain all contents?

回复收藏 0 原文

~没有更多了~

关于作者

暂无简介

0 文章

0 评论

24 人气

关注发私信

相关话题

热门标签

操作系统程序设计 IT运维 Linux系统管理 JavaScript 服务器应用 solaris C/C++ PHP Shell BSD Vue.js aix Oracle Python HTML 系统管理 HTML5 CSS 前端

推荐作者

我早已燃尽

文章 0 评论 0

就像说晚安

文章 0 评论 0

donghfcn

文章 0 评论 0

脱单之前绝不改名′

文章 0 评论 0

凡尘雨

文章 0 评论 0

鲜血染红嫁衣

文章 0 评论 0

友情链接

我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的隐私政策了解更多相关信息。单击 接受 或继续使用网站，即表示您同意使用 Cookies 和您的相关数据。

原文