如何使用 Nokogiri 去除 HTML 标签之间的间隙？

发布于 2024-12-28 04:33:45 字数 455 浏览 2 评论 0原文

假设我有这种标记：

<li>    Some text </li>
<li> <strong>  Some text </strong> hello</li>

我需要确保在开始

标记之后和任何封闭的文本内容之前没有空格。使用 Nokogiri 实现这一目标的最佳方法是什么？

期望的结果：

<li>Some text </li>
<li><strong>Some text </strong> hello</li>

原文

Say I have this kind of markup:

<li>    Some text </li>
<li> <strong>  Some text </strong> hello</li>

I need to ensure that there is no whitespace gap after the opening <li> tag and before any enclosed text content. What is the best way to accomplish this with Nokogiri?

Desired result:

<li>Some text </li>
<li><strong>Some text </strong> hello</li>

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

池予 2025-01-04 04:33:45

删除整个文档中的所有前导/尾随空格：

doc.xpath('//text()').each do |node|
  if node.content=~/\S/
    node.content = node.content.strip
  else
    node.remove
  end
end

但是，请注意，这会将

Hello World

变成 你好世界

。您可能需要更准确地指定您想要什么。

编辑：这是一个更好的解决方案，可以从元素的第一个子元素的所有文本节点中删除前导空格，并从最后一个子元素的文本节点中删除所有尾随空格：

doc.xpath('//text()[1]').each{ |t|      t.content = t.content.lstrip }
doc.xpath('//text()[last()]').each{ |t| t.content = t.content.rstrip }

在操作中看到：

html = '<ul>
  <li>    First text </li>
  <li> <strong>  Some text </strong> </li>
  <li> I am <b>  embedded  </b> and need <i>some </i>  <em>spaces</em>. </li>
</ul>'

require 'nokogiri'
doc = Nokogiri.HTML(html)
doc.xpath('//text()[1]').each{ |t|      t.content = t.content.lstrip }
doc.xpath('//text()[last()]').each{ |t| t.content = t.content.rstrip }
puts doc.root
#=> <html><body><ul>
#=> <li>First text</li><li><strong>Some text</strong></li>
#=>   <li>I am <b>embedded</b> and need <i>some</i>  <em>spaces</em>.</li></ul></body></html>

编辑#2：以下是如何将其从

前面的文本节点上删除：

doc.xpath('//li/text()[1]').each{ |t| t.content = t.content.lstrip }

Removing all leading/trailing whitespace in the whole doc:

doc.xpath('//text()').each do |node|
  if node.content=~/\S/
    node.content = node.content.strip
  else
    node.remove
  end
end

However, note that this will turn Hello World into HelloWorld. You likely need to more precisely specify what you want.

Edit: Here's a better solution that removes leading space from all text nodes that are the first child of an element, and all trailing space from text nodes that are the last child:

doc.xpath('//text()[1]').each{ |t|      t.content = t.content.lstrip }
doc.xpath('//text()[last()]').each{ |t| t.content = t.content.rstrip }

Seen in action:

html = '<ul>
  <li>    First text </li>
  <li> <strong>  Some text </strong> </li>
  <li> I am <b>  embedded  </b> and need <i>some </i>  <em>spaces</em>. </li>
</ul>'

require 'nokogiri'
doc = Nokogiri.HTML(html)
doc.xpath('//text()[1]').each{ |t|      t.content = t.content.lstrip }
doc.xpath('//text()[last()]').each{ |t| t.content = t.content.rstrip }
puts doc.root
#=> <html><body><ul>
#=> <li>First text</li><li><strong>Some text</strong></li>
#=>   <li>I am <b>embedded</b> and need <i>some</i>  <em>spaces</em>.</li></ul></body></html>

Edit #2: Here's how to strip it just off text nodes at the front of <li>:

doc.xpath('//li/text()[1]').each{ |t| t.content = t.content.lstrip }

回复收藏 0 原文

惟欲睡 2025-01-04 04:33:45

您将遍历每个 li 并删除前导空格，直到找到一些文本：

doc.css('li').each do |li|
    li.traverse do |node|
        node.content = node.content.gsub(/^\s+/,'')
        break unless node.content.empty?
    end
end

You would traverse each li removing leading whitespace until you find some text:

doc.css('li').each do |li|
    li.traverse do |node|
        node.content = node.content.gsub(/^\s+/,'')
        break unless node.content.empty?
    end
end

回复收藏 0 原文

∝单色的世界 2025-01-04 04:33:45

操作 Nokogiri::HTML.fragment 时，xpath("//text()") 似乎不起作用。

这就是我的想法

doc.traverse do |node|
  if node.is_a? Nokogiri::XML::Text
    node.content = node.content.lstrip if node.previous_element&.description&.block?
    node.content = node.content.lstrip if node.previous_element.nil? && node.parent.description&.block?
    node.content = node.content.rstrip if node.next_element&.description&.block?
    node.content = node.content.rstrip if node.next_element.nil? && node.parent.description&.block?
    node.remove if node.content.empty?
  end
end

注意：使用 Ruby 2.3 语法

When manipulating a Nokogiri::HTML.fragment, the xpath("//text()") doesn't seem to work.

So here's what I came up with

doc.traverse do |node|
  if node.is_a? Nokogiri::XML::Text
    node.content = node.content.lstrip if node.previous_element&.description&.block?
    node.content = node.content.lstrip if node.previous_element.nil? && node.parent.description&.block?
    node.content = node.content.rstrip if node.next_element&.description&.block?
    node.content = node.content.rstrip if node.next_element.nil? && node.parent.description&.block?
    node.remove if node.content.empty?
  end
end

Note: uses Ruby 2.3 syntax

回复收藏 0 原文

~没有更多了~