使用 Nokogiri 解析 blogspot XML 文件

发布于 2024-09-11 01:14:41 字数 646 浏览 14 评论 0原文

我有一个 blogspot 导出的 xml 文件，它看起来像这样：

<feed>
<entry>
<title> title </title>
<content type="html"> Content </content>
</entry>
<entry>
<title> title </title>
<content type="html"> Content </content>
</entry>
</feed>

How do I parse with Nokogiri and Xpath???

这是我所拥有的：

#!/usr/bin/env ruby

require 'rubygems'
require 'nokogiri'


 doc = Nokogiri::XML(File.open("blogspot.xml"))

 doc.xpath('//content[@type="html"]').each do |node|
  puts node.text
 end

但它没有给我任何东西：/

有什么建议吗？ :/

原文

I have a blogspot exported xml file and it looks something like this:

<feed>
<entry>
<title> title </title>
<content type="html"> Content </content>
</entry>
<entry>
<title> title </title>
<content type="html"> Content </content>
</entry>
</feed>

How do I parse with Nokogiri and Xpath???

Here is what I have :

#!/usr/bin/env ruby

require 'rubygems'
require 'nokogiri'


 doc = Nokogiri::XML(File.open("blogspot.xml"))

 doc.xpath('//content[@type="html"]').each do |node|
  puts node.text
 end

but it's not giving me anything :/

any suggestions? :/

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

西瓜 2024-09-18 01:14:41

你的代码对我有用。某些版本的 Nokigiri 存在一些问题。

我得到：

 Content
 Content

我正在使用 nokogiri (1.4.1 x86-mswin32)

Your code works for me. There were some problems with certain version of Nokigiri.

I get:

 Content
 Content

I'm using nokogiri (1.4.1 x86-mswin32)

回复收藏 0 原文

遗弃Ｍ 2024-09-18 01:14:41

事实证明我必须删除 feed 的属性

<feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/' xmlns:georss='http://www.georss.org/georss' xmlns:gd='http://schemas.google.com/g/2005' xmlns:thr='http://purl.org/syndication/thread/1.0'>

turns out that i had to delete the attributes for feed

<feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/' xmlns:georss='http://www.georss.org/georss' xmlns:gd='http://schemas.google.com/g/2005' xmlns:thr='http://purl.org/syndication/thread/1.0'>

回复收藏 0 原文

云胡 2024-09-18 01:14:41

我刚刚偶然发现了这个问题。问题似乎是 XML 命名空间：

“结果我必须删除 feed 的属性”

<feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/' xmlns:georss='http://www.georss.org/georss' xmlns:gd='http://schemas.google.com/g/2005' xmlns:thr='http://purl.org/syndication/thread/1.0'>

XML 命名空间使访问节点变得复杂，因为它们提供了一种分隔相似标签的方法。阅读搜索 HTML / XML 文档 的“命名空间”部分。

Nokogiri 还具有 remove_namespaces! 方法有时是处理问题的有用方法，但也有一些缺点。

I just stumbled on this question. The issue appears to be XML namespaces:

"turns out that i had to delete the attributes for feed"

<feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/' xmlns:georss='http://www.georss.org/georss' xmlns:gd='http://schemas.google.com/g/2005' xmlns:thr='http://purl.org/syndication/thread/1.0'>

XML Namespaces complicate accessing nodes because they provide a way to separate similar tags. Read the "Namespaces" section of Searching an HTML / XML Document.

Nokogiri also has the remove_namespaces! method which is a sometimes-useful way of dealing with the problem but has some downsides too.

回复收藏 0 原文

~没有更多了~