使用 pythons lxml 库正确的 xpath 语法,用于解析任意嵌套 html 标签中的所有文本

发布于 2024-11-10 10:51:30 字数 704 浏览 0 评论 0原文

在 python 中使用 lxml 我创建了这个 xpath 语法

htmlPage.xpath("/html/body//a/text()")

它让我在我想要的某些 html 范围内获得所有 标签。现在我遇到 标签可能如下所示:

<a>This is a sentence with some <italic>italic text</italic>-formatting I want to parse.</a>

xpath 返回一个列表,其中的元素比我预期的多。我检查并认识到,它将上面提到的 标签拆分为两个列表元素,而不是一个。 我得到的不是字符串,而是

"This is a sentence with some italic text-formatting I want to parse."

两个字符串

"This is a sentence with some" # and
"-formatting I want to parse."

有没有办法纠正这个问题?

Using lxml in python I created this xpath syntax

htmlPage.xpath("/html/body//a/text()")

It gets me all <a>-tags in certain html scopes I desire. Now I encountered that the <a>-tags could look like this:

<a>This is a sentence with some <italic>italic text</italic>-formatting I want to parse.</a>

xpath returns me a list that has one element more then I expect. I checked that and recognized, that it splits the <a>-tag mentioned above into two list elements, instead of one. Instead of the string

"This is a sentence with some italic text-formatting I want to parse."

I get the two strings

"This is a sentence with some" # and
"-formatting I want to parse."

Is there a way to correct that?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

好多鱼好多余 2024-11-17 10:51:30

我通过首先获取所有 -标签

results = htmlPage.xpath("/html/body//a")

,然后迭代返回的列表并在列表元素上使用 text_content()解决了我的问题

for a_tag in results:
    print a_tag.text_content() # prints bthe whol string: "This is a sentence with some italic text-formatting I want to parse."

I solved my problem by first getting all <a>-tags

results = htmlPage.xpath("/html/body//a")

and then iterating the returned list and using text_content() on the list elements

for a_tag in results:
    print a_tag.text_content() # prints bthe whol string: "This is a sentence with some italic text-formatting I want to parse."
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文