XPath：有关 HTML 中的 p 元素的问题

发布于 2024-11-09 04:51:23 字数 587 浏览 0 评论 0原文

我有一个关于 XPath 和 HTML 中的 p 元素的问题。假设我遇到一个如下所示的 HTML 结构：

<div id="this-is-a-text">
This is text segment 1.
<p>This is text segment 2.</p>
this is text segment 3.
<div id="this-is-not-part-of-the-text">This doesn't belong to the text.</div>
This is text segment 4.
</div>

我想知道解析所有文本段的正确方法是什么，无论它们是否位于 p 元素内？（注意：元素的顺序是随机的。）

我不明白为什么 //div[@id="this-is-a-text"]/p 似乎这样做作业（而不是仅返回文本段 3），而 //div[@id="this-is-a-text"]/text() 根本不返回任何结果。

谁能帮助我理解这一点？

谢谢！

鲍勃

原文

I have a question concerning XPath and the p-element from HTML. Let's say I'm confronted with an HTML-structure that looks like this:

<div id="this-is-a-text">
This is text segment 1.
<p>This is text segment 2.</p>
this is text segment 3.
<div id="this-is-not-part-of-the-text">This doesn't belong to the text.</div>
This is text segment 4.
</div>

I'm wondering what's the correct way to parse all all text segments no matter if they're inside a p-element or not? (NB: The the sequence of the elements is random.)

What I don't understand is why //div[@id="this-is-a-text"]/p seems to do the job (instead of just returning text segment 3), whereas //div[@id="this-is-a-text"]/text() doesn't return any results at all.

Can anyone help me understand this?

Thanks!

Bob

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

一刻暧昧 2024-11-16 04:51:23

正如 Martin Honnen 提到的，查询 //div[@id="this-is-a-text"]/text() 应返回三个文本段的集合：

"\nThis is text segment 1.\n",
"\nthis is text segment 3.\n",
"\nThis is text segment 4.\n"

如果我正确理解你的问题，你需要像这样的查询

//div[@id="this-is-a-text"]//text()

这应该返回集合：

"\nThis is text segment 1.\n",
"This is text segment 2.",
"\nthis is text segment 3.\n",
"This doesn't belong to the text.",
"\nThis is text segment 4.\n"

As Martin Honnen mentioned, query //div[@id="this-is-a-text"]/text() should return set of three text segments:

"\nThis is text segment 1.\n",
"\nthis is text segment 3.\n",
"\nThis is text segment 4.\n"

If I understand your question right, you need query like

//div[@id="this-is-a-text"]//text()

And this should return set:

"\nThis is text segment 1.\n",
"This is text segment 2.",
"\nthis is text segment 3.\n",
"This doesn't belong to the text.",
"\nThis is text segment 4.\n"

回复收藏 0 原文

~没有更多了~