如何使用 Ruby 的 Sanitize/Nokogiri 来访问未标记的元素？

发布于 2024-09-08 00:34:46 字数 486 浏览 14 评论 0原文

我正在尝试构建一个 Sanitize 转换器，它接受可能格式错误的 HTML 输入，其中的元素位于任何标签，例如在本例中：

out of a tag<p>in a tag</p>out again!

我想让转换器将任何非标记元素包装在

标签中，以便上面的内容转换为：

<p>out of a tag</p><p>in a tag</p><p>out again!</p>

不幸的是，我无法计算了解如何选择未标记的元素，因为它不是节点。我确信我在这里遗漏了一些东西。有人可以给我一个正确的方向吗？

原文

I'm trying to build a Sanitize transformer that accepts potentially malformed HTML input with elements outside of any tags at all, such as in this example:

out of a tag<p>in a tag</p>out again!

I want to have the transformer wrap any non-tagged elements in <p> tags so that the above transforms into:

<p>out of a tag</p><p>in a tag</p><p>out again!</p>

Unfortunately, I can't figure out how to select the untagged element because it's not a node. I'm sure I'm missing something here. Can someone give me a nudge in the right direction?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

沫雨熙 2024-09-15 00:34:47

require 'nokogiri'

html = 'out of a tag<p>in a tag</p>out again!'

Nokogiri::HTML(html).at_css('body').children.
  map {|x| '<p>' + x.text + '</p>' }.join('')
#=> "<p>out of a tag</p><p>in a tag</p><p>out again!</p>"

文本存储在文本节点中。由于 CSS 无法选择文本节点，因此您必须使用其他方法来获取它们，例如 Nokogiri::XML::Node#children。

require 'nokogiri'

html = 'out of a tag<p>in a tag</p>out again!'

Nokogiri::HTML(html).at_css('body').children.
  map {|x| '<p>' + x.text + '</p>' }.join('')
#=> "<p>out of a tag</p><p>in a tag</p><p>out again!</p>"

Text is stored in text nodes. Because CSS cannot select text nodes, you will have to use other methods to get them like Nokogiri::XML::Node#children.

回复收藏 0 原文

~没有更多了~