在 ruby 中创建大文件 xml
我想将大约 50MB 的数据写入 XML 文件。
我发现 Nokogiri (1.5.0) 在仅读取而不是写入时解析效率很高。 Nokogiri 不是写入 XML 文件的好选择,因为它将完整的 XML 数据保存在内存中,直到最终写入。
我发现 Builder (3.0.0) 是一个不错的选择,但我不确定它是否是最好的选择。
我使用以下简单代码尝试了一些基准测试:
(1..500000).each do |k|
xml.products {
xml.widget {
xml.id_ k
xml.name "Awesome widget"
}
}
end
Nokogiri 花费了大约 143 秒,并且内存消耗逐渐增加,最终达到大约 700 MB。
Builder 花费了大约 123 秒,内存消耗足够稳定在 10 MB。
那么有没有更好的解决方案来用 Ruby 编写巨大的 XML 文件(50 MB)呢?
这是使用 Nokogiri 的代码:
require 'rubygems'
require 'nokogiri'
a = Time.now
builder = Nokogiri::XML::Builder.new do |xml|
xml.root {
(1..500000).each do |k|
xml.products {
xml.widget {
xml.id_ k
xml.name "Awesome widget"
}
}
end
}
end
o = File.new("test_noko.xml", "w")
o.write(builder.to_xml)
o.close
puts (Time.now-a).to_s
这是使用 Builder 的代码:
require 'rubygems'
require 'builder'
a = Time.now
File.open("test.xml", 'w') {|f|
xml = Builder::XmlMarkup.new(:target => f, :indent => 1)
(1..500000).each do |k|
xml.products {
xml.widget {
xml.id_ k
xml.name "Awesome widget"
}
}
end
}
puts (Time.now-a).to_s
I want to write approximately 50MB of data to an XML file.
I found Nokogiri (1.5.0) to be efficient for parsing when just reading and not writing. Nokogiri is not a good option to write to an XML file since it holds the complete XML data in memory until it finally writes it.
I found Builder (3.0.0) to be a good option but I'm not sure if it's the best option.
I tried some benchmarks with the following simple code:
(1..500000).each do |k|
xml.products {
xml.widget {
xml.id_ k
xml.name "Awesome widget"
}
}
end
Nokogiri takes about 143 seconds and also memory consumption gradually increased and ended at about 700 MB.
Builder took about 123 seconds and memory consumption was stable enough at 10 MB.
So is there a better solution to write huge XML files (50 MB) in Ruby?
Here's the code using Nokogiri:
require 'rubygems'
require 'nokogiri'
a = Time.now
builder = Nokogiri::XML::Builder.new do |xml|
xml.root {
(1..500000).each do |k|
xml.products {
xml.widget {
xml.id_ k
xml.name "Awesome widget"
}
}
end
}
end
o = File.new("test_noko.xml", "w")
o.write(builder.to_xml)
o.close
puts (Time.now-a).to_s
Here's the code using Builder:
require 'rubygems'
require 'builder'
a = Time.now
File.open("test.xml", 'w') {|f|
xml = Builder::XmlMarkup.new(:target => f, :indent => 1)
(1..500000).each do |k|
xml.products {
xml.widget {
xml.id_ k
xml.name "Awesome widget"
}
}
end
}
puts (Time.now-a).to_s
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
解决方案 1
如果速度是您主要关心的问题,我就使用 libxml-ruby< /a> 直接:
API 非常简单:
使用
:indent =>; true
在这种情况下没有太大区别,但对于更复杂的 XML 文件可能会有太大区别。解决方案 2
当然,最快的解决方案(不会占用内存)就是手动编写 XML,但这很容易生成其他错误源,例如可能无效的 XML:
代码如下:
Solution 1
If speed is your main concern, I'd just use libxml-ruby directly:
The API is pretty straight forward:
Using
:indent => true
doesn't make much difference in this case, but for more complex XML files it might.Solution 2
Of course the fastest solution, and that doesn't build up on memory is just to write the XML manually but that will easily generate other sources of error like possibly invalid XML:
Here's the code: