如何使用 Nokogiri 和 XPath 或 CSS 选择器选择 HTML 块?
在我的 Rails 应用程序中,我有如下 HTML,在 Nokogiri 中解析。
我希望能够选择 HTML 块。例如,如何使用 XPath 或 CSS 选择属于 一部分的 HTML 块?假设在真实的 HTML 中,带有
********
的部分不存在。
我想通过 分割 HTML,但问题是节点是兄弟节点。
<sup class="v" id="20">
1
</sup>
this is some random text
<p></p>
more random text
<sup class="footnote" value='fn1'>
[v]
</sup>
# ****************************** starting here
<sup class="v" id="21">
2
</sup>
now this is a different section
<p></p>
how do we keep this separate
<sup class="footnote" value='fn2'>
[x]
</sup>
# ****************************** ending here
<sup class="v" id="23">
3
</sup>
this is yet another different section
<p></p>
how do we keep this separate too
<sup class="footnote" value='fn3'>
[r]
</sup>
In my Rails app I have HTML like the following, parsed in Nokogiri.
I want to be able to select chunks of HTML. For example, how can I select the block of HTML that's part of <sup id="21">
using XPath or CSS? Assume that in the real HTML the section with ********
does not exist.
I want to split the HTML by <sup id=*>
but the problem is that the nodes are siblings.
<sup class="v" id="20">
1
</sup>
this is some random text
<p></p>
more random text
<sup class="footnote" value='fn1'>
[v]
</sup>
# ****************************** starting here
<sup class="v" id="21">
2
</sup>
now this is a different section
<p></p>
how do we keep this separate
<sup class="footnote" value='fn2'>
[x]
</sup>
# ****************************** ending here
<sup class="v" id="23">
3
</sup>
this is yet another different section
<p></p>
how do we keep this separate too
<sup class="footnote" value='fn3'>
[r]
</sup>
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
这是一个简单的解决方案,为您提供包含
之间所有节点的
NodeSet
,并按其id
进行哈希处理>。或者,如果您确实不希望分隔“sup”成为集合的一部分,则可以这样做:
这是一个替代的、更通用的解决方案:
Here's a simple solution that gives you
NodeSet
s with all the nodes between<sup … class="v">
, hashed by theirid
.Alternatively, if you didn't really want the delimiting 'sup' to be part of the collection, instead do:
Here's an alternative, even-more-generic solution:
看起来您想要选择带有
@id='21'
的sup
和带有@id=' 的
。使用以下临时表达式:sup
之间的所有内容23'或 Kayessian 节点集交集公式的应用:
It looks like you want to select everything between the
sup
with@id='21'
and thesup
with@id='23'
. Use the following ad-hoc expression:Or an application of the Kayessian node-set intersection formula: