两个标签之间的 XQuery 提取
我目前正在研究从 HTML
中提取数据。我想提取两个
标记之间的文本。
<p class="xfHeading"><b>XYZ:</b></p>
<p>asdfghjk</p>
<p>sdsdsd</p>
<p>asdvcvcfghjk</p>
<p class="xfHeading"><b>ABC:</b></p>
<P>fvgbhnjm</P>
<p class="xfHeading"><b>PQR:</b></p>
<ul>
</ul>
<p class="xfHeading"><b>MNO:</b></p>
<ul>
<li>jdjshdj</li>
</ul>
输出应该是:
asdfghjk
sdsdsd
asdvcvcfghjk
一种方法是:
/p[class="xfHeading"]/following-sibling::p[0]|/p[class="xfHeading"]/following-sibling::p[1]|/p[class="xfHeading"]/following-sibling::p[2]
或
/p[class="xfHeading"]/following-sibling::p[position()<4]
但是,由于之间的内容一直在变化,我需要一个解决方案,其中两个标签之间的内容 < p class="xfHeading">
被提取。
I am currently working on extracting data from HTML
. I would like to extract the text between two <p class="xfHeading">
tags.
<p class="xfHeading"><b>XYZ:</b></p>
<p>asdfghjk</p>
<p>sdsdsd</p>
<p>asdvcvcfghjk</p>
<p class="xfHeading"><b>ABC:</b></p>
<P>fvgbhnjm</P>
<p class="xfHeading"><b>PQR:</b></p>
<ul>
</ul>
<p class="xfHeading"><b>MNO:</b></p>
<ul>
<li>jdjshdj</li>
</ul>
The output should be :
asdfghjk
sdsdsd
asdvcvcfghjk
One way to do this is :
/p[class="xfHeading"]/following-sibling::p[0]|/p[class="xfHeading"]/following-sibling::p[1]|/p[class="xfHeading"]/following-sibling::p[2]
or
/p[class="xfHeading"]/following-sibling::p[position()<4]
However since the content between keeps on changing all the time I need a solution wherein the content between the two tags <p class="xfHeading">
is extracted.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
使用:
这意味着:选择第一个
p
p 元素的文本节点子元素> 文档中class
属性值为xfHeading
的元素,同时位于文档中第二个p
元素之前class
属性的值为xfHeading
。Use:
This means: Select the text-node children of all
p
elements that are following siblings of the firstp
element in the document withclass
attribute having value ofxfHeading
, and that at the same time are preceding the secondp
element in the document withclass
attribute having value ofxfHeading
.编辑:在您澄清后,我的建议是使用 FLWOR表达式如下。这将根据
标记的唯一内容查找具有正确
标记内容的
,并返回与其同级的每个
标记的文本。
请注意,
//
是一个 XPATH 构造,而不是注释旧答案:如果没有您希望结果数据是什么样子的示例,则回答问题是有点难。但是,例如,要选择
标记内的文本,您可以执行以下操作:
一般来说,将
text()
附加到表达式的末尾返回相关节点内的文本。EDIT: After your clarification, my suggestion is to use a FLWOR expression such as the following. This looks for a
<p>
with the proper<b>
tag contents based on the unique contents of that<b>
tag, and returns the text of each<p>
tag that is a sibling of it.Note that the
//
is an XPATH construct, not a commentOLD ANSWER: Without an example of what you'd like the resulting data to look like, answering the question is a bit tough. However, to select, for instance, the text inside a
<b>
tag, you'd do:In general, appending
text()
to the end of an expression returns the text inside the node in question.