在 Ruby 中解析 XML 标签时如何获得所有内容的总和？

发布于 07-23 08:27 字数 709 浏览 9 评论 0原文

我有一些 XHTML（但实际上任何 XML 都可以），如下所示：

<h1>
  Hello<span class='punctuation'>,</span>
  <span class='noun'>World<span class='punctuation'>!</span>
</h1>

How do I get the full content of the

as a String in Ruby? 如：

assert_equal "Hello, World!", h1_node.some_method_that_aggregates_all_content

执行任何 XML 框架 (Nokogiri、libxml-ruby, &c.) 内置了这种东西吗？如果没有，我觉得 Y-Combinator 可能是完成这项工作的正确工具，但我不太清楚它会是什么样子。

原文

I have some XHTML (but really any XML will do) like this:

<h1>
  Hello<span class='punctuation'>,</span>
  <span class='noun'>World<span class='punctuation'>!</span>
</h1>

How do I get the full content of the <h1/> as a String in Ruby? As in:

assert_equal "Hello, World!", h1_node.some_method_that_aggregates_all_content

Do any of the XML frameworks (Nokogiri, libxml-ruby, &c.) have this sort of thing built-in? If not, I feel like a Y-Combinator might the right tool for the job, but I can't quite figure out what it would look like.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

不美如何2024-07-30 08:27:55

使用 Nokogiri，您只需询问节点的文本。不过，我在执行此操作时看到的问题是，该节点中的所有空格和换行符都将被返回，因此您可能希望将它们删除（可能是比我在本示例中所做的更好的方法）。

这是一个示例：

def test_nokogiri_text
  value = Nokogiri::HTML.parse(<<-HTML_END)
    "<h1>
      Hello<span class='punctuation'>,</span>
      <span class='noun'>World<span class='punctuation'>!</span>
     </h1>"
  HTML_END

  h1_node = value.search("h1").first
  assert_equal("Hello, World!", h1_node.text.split(/\s+/).join(' ').strip)
end

With Nokogiri you can just ask for the text of a node. The issue I see when doing that though is that all of the whitespace and newlines that are in that node will be returned, so you might want to strip those out (likely a better way to do that than what I did for this example).

Here is a sample:

def test_nokogiri_text
  value = Nokogiri::HTML.parse(<<-HTML_END)
    "<h1>
      Hello<span class='punctuation'>,</span>
      <span class='noun'>World<span class='punctuation'>!</span>
     </h1>"
  HTML_END

  h1_node = value.search("h1").first
  assert_equal("Hello, World!", h1_node.text.split(/\s+/).join(' ').strip)
end

回复收藏 0 原文

我的奇迹2024-07-30 08:27:55

Nokogiri 的 Nokogiri::XML::Node#content 就可以了它：

irb(main):020:0> node
=> <h1>
  Hello<span class="punctuation">,</span>
  <span class="noun">World<span class="punctuation">!</span>
</span>
</h1>
irb(main):021:0> node.content
=> "\n  Hello,\n  World!\n\n"

Nokogiri's Nokogiri::XML::Node#content will do it:

irb(main):020:0> node
=> <h1>
  Hello<span class="punctuation">,</span>
  <span class="noun">World<span class="punctuation">!</span>
</span>
</h1>
irb(main):021:0> node.content
=> "\n  Hello,\n  World!\n\n"

回复收藏 0 原文

~没有更多了~