Nokogiri 用 ed 单词替换内部文本

发布于 2024-12-04 21:18:36 字数 1112 浏览 1 评论 0原文

这是一个示例 HTML 片段:

<p class="stanza">Thus grew the tale of Wonderland:<br/>
  Thus slowly, one by one,<br/>
  Its quaint events were hammered out -<br/>
  And now the tale is done,<br/>
  And home we steer, a merry crew,<br/>
  Beneath the setting sun.<br/></p>

我需要用 包围每个单词,因此 像这样:

<span id='w1'>Anon,</span> <span id='w2'>to</span> <span id='w3'>sudden</span> 
<span id='w4'>silence</span> <span id='w5'>won,</span> ....

我编写了这个创建新片段的代码。如何以新换旧?

def callchildren(n)
  n.children.each do |n| # call recursively until arrive at a node w/o children
    callchildren(n)
  end
  if n.node_type == 3 && n.to_s.strip.empty? != true 
    new_node = ""
    n.to_s.split.each { |w|
      new_node = new_node + "<span id='w#{$word_number}'>#{w}</span> "
      $word_number += 1
    }  
    # puts new_node 
    # HELP? How do I get new_node swapped in?
  end
end

Here's an example HTML fragment:

<p class="stanza">Thus grew the tale of Wonderland:<br/>
  Thus slowly, one by one,<br/>
  Its quaint events were hammered out -<br/>
  And now the tale is done,<br/>
  And home we steer, a merry crew,<br/>
  Beneath the setting sun.<br/></p>

I need to surround each word with a <span id="w0">Thus </span> like this:

<span id='w1'>Anon,</span> <span id='w2'>to</span> <span id='w3'>sudden</span> 
<span id='w4'>silence</span> <span id='w5'>won,</span> ....

I written this which creates the new fragment. How do I replace/swap the new for old?

def callchildren(n)
  n.children.each do |n| # call recursively until arrive at a node w/o children
    callchildren(n)
  end
  if n.node_type == 3 && n.to_s.strip.empty? != true 
    new_node = ""
    n.to_s.split.each { |w|
      new_node = new_node + "<span id='w#{$word_number}'>#{w}</span> "
      $word_number += 1
    }  
    # puts new_node 
    # HELP? How do I get new_node swapped in?
  end
end

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

故人如初 2024-12-11 21:18:36

我尝试为您的问题提供解决方案:

require 'nokogiri'

Inf = 1.0/0.0

def number_words(node, counter = nil)
  # define infinite counter (Ruby >= 1.8.7)
  counter ||= (1..Inf).each
  doc = node.document

  unless node.is_a?(Nokogiri::XML::Text)
    # recurse for children and collect all the returned
    # nodes into an array
    children = node.children.inject([]) { |acc, child|
      acc += number_words(child, counter)
    }
    # replace the node's children
    node.children = Nokogiri::XML::NodeSet.new(doc, children)
    return [node]
  end

  # for text nodes, we generate a list of span nodes
  # and return it (this is more secure than OP's original
  # approach that is vulnerable to HTML injection)n
  node.to_s.strip.split.inject([]) { |acc, word|
    span = Nokogiri::XML::Node.new("span", node)
    span.content = word
    span["id"] = "w#{counter.next}"
    # add a space if we are not at the beginning
    acc << Nokogiri::XML::Text.new(" ", doc) unless acc.empty?
    # add our new span to the collection
    acc << span
  }
end

# demo
if __FILE__ == $0
  h = <<-HTML
  <p class="stanza">Thus grew the tale of Wonderland:<br/>
    Thus slowly, one by one,<br/>
    Its quaint events were hammered out -<br/>
    And now the tale is done,<br/>
    And home we steer, a merry crew,<br/>
    Beneath the setting sun.<br/></p>
  HTML

  doc = Nokogiri::HTML.parse(h)
  number_words(doc)
  p doc.to_xml
end

My attempt to provide a solution for your problem:

require 'nokogiri'

Inf = 1.0/0.0

def number_words(node, counter = nil)
  # define infinite counter (Ruby >= 1.8.7)
  counter ||= (1..Inf).each
  doc = node.document

  unless node.is_a?(Nokogiri::XML::Text)
    # recurse for children and collect all the returned
    # nodes into an array
    children = node.children.inject([]) { |acc, child|
      acc += number_words(child, counter)
    }
    # replace the node's children
    node.children = Nokogiri::XML::NodeSet.new(doc, children)
    return [node]
  end

  # for text nodes, we generate a list of span nodes
  # and return it (this is more secure than OP's original
  # approach that is vulnerable to HTML injection)n
  node.to_s.strip.split.inject([]) { |acc, word|
    span = Nokogiri::XML::Node.new("span", node)
    span.content = word
    span["id"] = "w#{counter.next}"
    # add a space if we are not at the beginning
    acc << Nokogiri::XML::Text.new(" ", doc) unless acc.empty?
    # add our new span to the collection
    acc << span
  }
end

# demo
if __FILE__ == $0
  h = <<-HTML
  <p class="stanza">Thus grew the tale of Wonderland:<br/>
    Thus slowly, one by one,<br/>
    Its quaint events were hammered out -<br/>
    And now the tale is done,<br/>
    And home we steer, a merry crew,<br/>
    Beneath the setting sun.<br/></p>
  HTML

  doc = Nokogiri::HTML.parse(h)
  number_words(doc)
  p doc.to_xml
end
泪眸﹌ 2024-12-11 21:18:36

给定 doc 中的 Nokogiri::HTML::Document,您可以执行以下操作:

i = 0
doc.search('//p[@class="stanza"]/text()').each do |n|
    spans = n.content.scan(/\S+/).map do |s|
        "<span id=\"w#{i += 1}\">" + s + '</span>'
    end
    n.replace(spans.join(' '))
end

Given a Nokogiri::HTML::Document in doc, you could do something like this:

i = 0
doc.search('//p[@class="stanza"]/text()').each do |n|
    spans = n.content.scan(/\S+/).map do |s|
        "<span id=\"w#{i += 1}\">" + s + '</span>'
    end
    n.replace(spans.join(' '))
end
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文