Nokogiri 和按名称查找元素

发布于 2024-11-03 19:29:26 字数 383 浏览 2 评论 0原文

我正在使用 Nokogiri 解析 XML 文件，其中包含以下代码片段：

doc.xpath('//root').each do |root|
  puts "# ROOT found"
  root.xpath('//page').each do |page|
    puts "## PAGE found / #{page['id']} / #{page['name']} / #{page['width']} / #{page['height']}"
    page.children.each do |content|
      ...
    end
  end
end

如何解析页面元素中的所有元素？共有三种不同的元素：图像、文本和视频。如何为每个元素制作案例陈述？

原文

I am parsing an XML file using Nokogiri with the following snippet:

doc.xpath('//root').each do |root|
  puts "# ROOT found"
  root.xpath('//page').each do |page|
    puts "## PAGE found / #{page['id']} / #{page['name']} / #{page['width']} / #{page['height']}"
    page.children.each do |content|
      ...
    end
  end
end

How can I parse through all elements in the page element? There are three different elements: image, text and video. How can I make a case statement for each element?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

烟雨凡馨 2024-11-10 19:29:26

说实话，你看起来离我很近。。

doc.xpath('//root').each do |root|
  puts "# ROOT found"
  root.xpath('//page').each do |page|
    puts "## PAGE found / #{page['id']} / #{page['name']} / #{page['width']} / #{page['height']}"
    page.children.each do |child|
      case child.name
       when 'image'  
          do_image_stuff
       when 'text'
          do_text_stuff
       when 'video'
          do_video_stuff
       end
    end
  end
end

Honestly, you look pretty close to me..

doc.xpath('//root').each do |root|
  puts "# ROOT found"
  root.xpath('//page').each do |page|
    puts "## PAGE found / #{page['id']} / #{page['name']} / #{page['width']} / #{page['height']}"
    page.children.each do |child|
      case child.name
       when 'image'  
          do_image_stuff
       when 'text'
          do_text_stuff
       when 'video'
          do_video_stuff
       end
    end
  end
end

回复收藏 0 原文

习ぎ惯性依靠 2024-11-10 19:29:26

Nokogiri 的 CSS 和 XPath 访问器都允许指定多个标签，这对于解决此类问题非常有用。而不是遍历文档的 page 标记中的每个标记：

require 'nokogiri'

doc = Nokogiri::XML('
  <xml>
  <body>
  <image>image</image>
  <text>text</text>
  <video>video</video>
  <other>other</other>
  <image>image</image>
  <text>text</text>
  <video>video</video>
  <other>other</other>
  </body>
  </xml>')

这是使用 CSS 的搜索：

doc.search('image, text, video').each do |node|
  case node.name
  when 'image'
    puts node.text
  when 'text'
    puts node.text
  when 'video'
    puts node.text
  else
    puts 'should never get here'
  end
end

# >> image
# >> image
# >> text
# >> text
# >> video
# >> video

请注意，它按照 CSS 访问器指定的顺序返回标记。如果需要文档中标签的顺序，可以使用 XPath：

doc.search('//image | //text | //video').each do |node|
  puts node.text
end

# >> image
# >> text
# >> video
# >> image
# >> text
# >> video

无论哪种情况，程序都应该运行得更快，因为所有搜索都发生在 libXML 中，仅返回 Ruby 处理所需的节点。

如果您需要将搜索限制在标记内，您可以预先进行搜索以找到 page 节点，然后在其下方搜索：

doc.at('page').search('image, text, video').each do |node|
  ...
end

或

doc.at('//page').search('//image | //text | //video').each do |node|
  ...
end

Both Nokogiri's CSS and XPath accessors allow multiple tags to be specified, which can be useful for this sort of problem. Rather than walk through every tag in the document's page tag:

require 'nokogiri'

doc = Nokogiri::XML('
  <xml>
  <body>
  <image>image</image>
  <text>text</text>
  <video>video</video>
  <other>other</other>
  <image>image</image>
  <text>text</text>
  <video>video</video>
  <other>other</other>
  </body>
  </xml>')

This is a search using CSS:

doc.search('image, text, video').each do |node|
  case node.name
  when 'image'
    puts node.text
  when 'text'
    puts node.text
  when 'video'
    puts node.text
  else
    puts 'should never get here'
  end
end

# >> image
# >> image
# >> text
# >> text
# >> video
# >> video

Notice it returns the tags in the order that the CSS accessor specifies it. If you need the order of the tags in the document, you can use XPath:

doc.search('//image | //text | //video').each do |node|
  puts node.text
end

# >> image
# >> text
# >> video
# >> image
# >> text
# >> video

In either case, the program should run faster because all the searching occurs in libXML, returning only the nodes you need for Ruby's processing.

If you need to restrict the search to within a <page> tag you can do a search up front to find the page node, then search underneath it:

doc.at('page').search('image, text, video').each do |node|
  ...
end

doc.at('//page').search('//image | //text | //video').each do |node|
  ...
end

回复收藏 0 原文

~没有更多了~

关于作者

骄傲

暂无简介

0 文章

0 评论

24 人气

关注发私信

友情链接

文江博客

Nokogiri 和按名称查找元素

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

avyhlj

廾匸

自演自醉

臧立杰

mb_XvqQsWhl

鲜血染红嫁衣

友情链接

Nokogiri 和按名称查找元素

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

avyhlj

廾匸

自演自醉

臧立杰

mb_XvqQsWhl

鲜血染红嫁衣

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。