使用 Ruby 和 Hpricot 将 xml 转换为 yaml - 这里出了什么问题?
我正在尝试将 xml 文件 blog.xml 输出为 yaml,以便放入 Vision.app(一种用于在本地设计 Shopify 电子商务网站的工具)。
Shopify 的 yaml 看起来像这样:
- id: 2
handle: bigcheese-blog
title: Bigcheese blog
url: /blogs/bigcheese-blog
articles:
- id: 1
title: 'One thing you probably did not know yet...'
author: Justin
content: Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
created_at: 2005-04-04 16:00
comments:
-
id: 1
author: John Smith
email: [email protected]
content: Wow...great article man.
status: published
created_at: 2009-01-01 12:00
updated_at: 2009-02-01 12:00
url: ""
-
id: 2
author: John Jones
email: [email protected]
content: I really enjoyed this article. And I love your shop! It's awesome. Shopify rocks!
status: published
created_at: 2009-03-01 12:00
updated_at: 2009-02-01 12:00
url: "http://somesite.com/"
- id: 2
title: Fascinating
author: Tobi
content: Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
created_at: 2005-04-06 12:00
comments:
articles_count: 2
comments_enabled?: true
comment_post_url: ""
comments_count: 2
moderated?: true
但是,示例 myxml 看起来像这样:
<article>
<author>Rouska Mellor</author>
<blog-id type="integer">273932</blog-id>
<body>Worn Again are hiring for a new Sales Director.
To view the full job description and details of how to apply click "here":http://antiapathy.org/?page_id=83</body>
<body-html><p>Worn Again are hiring for a new Sales Director.</p>
<p>To view the full job description and details of how to apply click <a href="http://antiapathy.org/?page_id=83">here</a></p></body-html>
<created-at type="datetime">2009-07-29T13:58:59+01:00</created-at>
<id type="integer">1179072</id>
<published-at type="datetime">2009-07-29T13:58:59+01:00</published-at>
<title>Worn Again are hiring!</title>
<updated-at type="datetime">2009-07-29T13:59:40+01:00</updated-at>
</article>
<article>
我天真地认为从一种序列化数据格式转换为另一种序列化数据格式相当简单,我可以简单地执行此操作:
>> require 'hpricot'
=> true
>> b = Hpricot.XML(open('blogs.xml'))
>> puts b.to_yaml
但我收到此错误。
NoMethodError: undefined method `yaml_tag_subclasses?' for Hpricot::Doc:Class
from /usr/local/lib/ruby/1.8/yaml/tag.rb:69:in `taguri'
from /usr/local/lib/ruby/1.8/yaml/rubytypes.rb:16:in `to_yaml'
from /usr/local/lib/ruby/1.8/yaml.rb:391:in `call'
from /usr/local/lib/ruby/1.8/yaml.rb:391:in `emit'
from /usr/local/lib/ruby/1.8/yaml.rb:391:in `quick_emit'
from /usr/local/lib/ruby/1.8/yaml/rubytypes.rb:15:in `to_yaml'
from /usr/local/lib/ruby/1.8/yaml.rb:117:in `dump'
from /usr/local/lib/ruby/1.8/yaml.rb:432:in `y'
from (irb):6
from :0
>>
如何获得本问题顶部概述的表格中的数据输出?我尝试导入“yaml”gem,认为我缺少其中一些方法,但这也没有帮助:
I'm trying to output an xml file blog.xml as yaml, for dropping into vision.app, a tool for designing shopify e-commerce sites locally.
Shopify's yaml looks like this:
- id: 2
handle: bigcheese-blog
title: Bigcheese blog
url: /blogs/bigcheese-blog
articles:
- id: 1
title: 'One thing you probably did not know yet...'
author: Justin
content: Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
created_at: 2005-04-04 16:00
comments:
-
id: 1
author: John Smith
email: [email protected]
content: Wow...great article man.
status: published
created_at: 2009-01-01 12:00
updated_at: 2009-02-01 12:00
url: ""
-
id: 2
author: John Jones
email: [email protected]
content: I really enjoyed this article. And I love your shop! It's awesome. Shopify rocks!
status: published
created_at: 2009-03-01 12:00
updated_at: 2009-02-01 12:00
url: "http://somesite.com/"
- id: 2
title: Fascinating
author: Tobi
content: Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
created_at: 2005-04-06 12:00
comments:
articles_count: 2
comments_enabled?: true
comment_post_url: ""
comments_count: 2
moderated?: true
However, sample myxml looks like this:
<article>
<author>Rouska Mellor</author>
<blog-id type="integer">273932</blog-id>
<body>Worn Again are hiring for a new Sales Director.
To view the full job description and details of how to apply click "here":http://antiapathy.org/?page_id=83</body>
<body-html><p>Worn Again are hiring for a new Sales Director.</p>
<p>To view the full job description and details of how to apply click <a href="http://antiapathy.org/?page_id=83">here</a></p></body-html>
<created-at type="datetime">2009-07-29T13:58:59+01:00</created-at>
<id type="integer">1179072</id>
<published-at type="datetime">2009-07-29T13:58:59+01:00</published-at>
<title>Worn Again are hiring!</title>
<updated-at type="datetime">2009-07-29T13:59:40+01:00</updated-at>
</article>
<article>
I naively assumed converting from one serialised data format to another was fairly straightforward, and I could simply do this:
>> require 'hpricot'
=> true
>> b = Hpricot.XML(open('blogs.xml'))
>> puts b.to_yaml
But I'm getting this error.
NoMethodError: undefined method `yaml_tag_subclasses?' for Hpricot::Doc:Class
from /usr/local/lib/ruby/1.8/yaml/tag.rb:69:in `taguri'
from /usr/local/lib/ruby/1.8/yaml/rubytypes.rb:16:in `to_yaml'
from /usr/local/lib/ruby/1.8/yaml.rb:391:in `call'
from /usr/local/lib/ruby/1.8/yaml.rb:391:in `emit'
from /usr/local/lib/ruby/1.8/yaml.rb:391:in `quick_emit'
from /usr/local/lib/ruby/1.8/yaml/rubytypes.rb:15:in `to_yaml'
from /usr/local/lib/ruby/1.8/yaml.rb:117:in `dump'
from /usr/local/lib/ruby/1.8/yaml.rb:432:in `y'
from (irb):6
from :0
>>
How can I get the data output in the form outlined at the top of this question? I've tried importing the 'yaml' gem, thinking that I'm missing some of those methods, but that hasn't helped either:
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
我找到了这个。也许会有帮助。
http://brains.parslow.net/node/1623
I've found this. Maybe it could help.
http://brains.parslow.net/node/1623
抱歉,Josh,我认为您在这里发现的是 Hpricot 和/或 YAML 库的限制,纯粹而简单。
我不确定 Hpricot 是否曾经以这种方式支持 YAML。有问题的方法由 YAML 库动态添加到 Object 类以及其他基本 Ruby 类型,但由于某种原因没有出现在 Hpricot::Doc 的定义中,尽管 Hpricot::Doc 似乎确实继承了间接来自对象。
我可以说我也复制了它,所以不仅仅是你。
您可以非常轻松地添加缺少的方法:
但您会发现这并没有让您走得更远。这是我得到的结果:
所以我们没有像我们应该的那样迭代容器。
此时,要使用 YAML 库获得 YAML 支持,强力方法(可能是唯一方法)是向 Hpricot 的类添加 to_yaml 方法,以教它们如何正确输出 YAML。看一下“/usr/lib/ruby/1.8/yaml/rubytypes.rb”(在 Mac 上,类似于“/System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/lib”) /ruby/1.8/yaml/rubytypes.rb")了解如何为每种基本 Ruby 类型完成此操作的示例。您可能需要将其添加到的类在 C 端定义:请参阅方法
Init_hpricot_scan
中的“hpricot/ext/hpricot_scan/hpricot_scan.rl”。Sorry, Josh, I think what you've found here is a limitation in the Hpricot and/or the YAML libraries, pure and simple.
I'm not sure Hpricot's ever supported YAML in this way. The method in question is dynamically added by the YAML library to the Object class, as well as other fundamental Ruby types, but doesn't show up in Hpricot::Doc's definition for some reason, even though Hpricot::Doc does seem to inherit indirectly from Object.
I can say that I've reproduced it as well, so it's not just you.
You can very easily add the missing method:
but you'll find that doesn't get you much further. Here's what I get:
So we're not iterating over the container like we should.
At this point, to get YAML support using the YAML library, the brute-force way (maybe the only way) would be to add
to_yaml
methods to Hpricot's classes, to teach them how to output YAML correctly. Take a look at "/usr/lib/ruby/1.8/yaml/rubytypes.rb" (on a Mac, that'd be something like "/System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/lib/ruby/1.8/yaml/rubytypes.rb") for examples of how that's done for each of the fundamental Ruby types. The classes you might need to add this to are defined on the C side: see "hpricot/ext/hpricot_scan/hpricot_scan.rl", in the methodInit_hpricot_scan
.