在 ruby​​ 中创建大文件 xml

发布于 2024-12-05 12:20:52 字数 1292 浏览 1 评论 0原文

我想将大约 50MB 的数据写入 XML 文件。

我发现 Nokogiri (1.5.0) 在仅读取而不是写入时解析效率很高。 Nokogiri 不是写入 XML 文件的好选择,因为它将完整的 XML 数据保存在内存中,直到最终写入。

我发现 Builder (3.0.0) 是一个不错的选择,但我不确定它是否是最好的选择。

我使用以下简单代码尝试了一些基准测试:

  (1..500000).each do |k|
    xml.products {
      xml.widget {
        xml.id_ k
        xml.name "Awesome widget"
      }
    }
    end

Nokogiri 花费了大约 143 秒,并且内存消耗逐渐增加,最终达到大约 700 MB。

Builder 花费了大约 123 秒,内存消耗足够稳定在 10 MB。

那么有没有更好的解决方案来用 Ruby 编写巨大的 XML 文件(50 MB)呢?

这是使用 Nokogiri 的代码:

require 'rubygems'
require 'nokogiri'
a = Time.now
builder = Nokogiri::XML::Builder.new do |xml|
  xml.root {
    (1..500000).each do |k|
    xml.products {
      xml.widget {
        xml.id_ k
        xml.name "Awesome widget"
      }
    }
    end
  }
end
o = File.new("test_noko.xml", "w")
o.write(builder.to_xml)
o.close
puts (Time.now-a).to_s

这是使用 Builder 的代码:

require 'rubygems'
require 'builder'
a = Time.now
File.open("test.xml", 'w') {|f|
xml = Builder::XmlMarkup.new(:target => f, :indent => 1)

  (1..500000).each do |k|
    xml.products {
      xml.widget {
        xml.id_ k
        xml.name "Awesome widget"
      }
    }
    end

}
puts (Time.now-a).to_s

I want to write approximately 50MB of data to an XML file.

I found Nokogiri (1.5.0) to be efficient for parsing when just reading and not writing. Nokogiri is not a good option to write to an XML file since it holds the complete XML data in memory until it finally writes it.

I found Builder (3.0.0) to be a good option but I'm not sure if it's the best option.

I tried some benchmarks with the following simple code:

  (1..500000).each do |k|
    xml.products {
      xml.widget {
        xml.id_ k
        xml.name "Awesome widget"
      }
    }
    end

Nokogiri takes about 143 seconds and also memory consumption gradually increased and ended at about 700 MB.

Builder took about 123 seconds and memory consumption was stable enough at 10 MB.

So is there a better solution to write huge XML files (50 MB) in Ruby?

Here's the code using Nokogiri:

require 'rubygems'
require 'nokogiri'
a = Time.now
builder = Nokogiri::XML::Builder.new do |xml|
  xml.root {
    (1..500000).each do |k|
    xml.products {
      xml.widget {
        xml.id_ k
        xml.name "Awesome widget"
      }
    }
    end
  }
end
o = File.new("test_noko.xml", "w")
o.write(builder.to_xml)
o.close
puts (Time.now-a).to_s

Here's the code using Builder:

require 'rubygems'
require 'builder'
a = Time.now
File.open("test.xml", 'w') {|f|
xml = Builder::XmlMarkup.new(:target => f, :indent => 1)

  (1..500000).each do |k|
    xml.products {
      xml.widget {
        xml.id_ k
        xml.name "Awesome widget"
      }
    }
    end

}
puts (Time.now-a).to_s

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

并安 2024-12-12 12:20:52

解决方案 1

如果速度是您主要关心的问题,我就使用 libxml-ruby< /a> 直接:

$ time ruby test.rb 

real    0m7.352s
user    0m5.867s
sys     0m0.921s

API 非常简单:

require 'rubygems'
require 'xml'
doc = XML::Document.new()
doc.root = XML::Node.new('root_node')
root = doc.root

500000.times do |k|
  root << elem1 = XML::Node.new('products')
  elem1 << elem2 = XML::Node.new('widget')
  elem2['id'] = k.to_s
  elem2['name'] = 'Awesome widget'
end

doc.save('foo.xml', :indent => false, :encoding => XML::Encoding::UTF_8)

使用 :indent =>; true 在这种情况下没有太大区别,但对于更复杂的 XML 文件可能会有太大区别。

$ time ruby test.rb #(with indent)

real    0m7.395s
user    0m6.050s
sys     0m0.847s

解决方案 2

当然,最快的解决方案(不会占用内存)就是手动编写 XML,但这很容易生成其他错误源,例如可能无效的 XML:

$ time ruby test.rb 

real    0m1.131s
user    0m0.873s
sys     0m0.126s

代码如下:

f = File.open("foo.xml", "w")
f.puts('<doc>')
500000.times do |k|
  f.puts "<product><widget id=\"#{k}\" name=\"Awesome widget\" /></product>"
end
f.puts('</doc>')
f.close

Solution 1

If speed is your main concern, I'd just use libxml-ruby directly:

$ time ruby test.rb 

real    0m7.352s
user    0m5.867s
sys     0m0.921s

The API is pretty straight forward:

require 'rubygems'
require 'xml'
doc = XML::Document.new()
doc.root = XML::Node.new('root_node')
root = doc.root

500000.times do |k|
  root << elem1 = XML::Node.new('products')
  elem1 << elem2 = XML::Node.new('widget')
  elem2['id'] = k.to_s
  elem2['name'] = 'Awesome widget'
end

doc.save('foo.xml', :indent => false, :encoding => XML::Encoding::UTF_8)

Using :indent => true doesn't make much difference in this case, but for more complex XML files it might.

$ time ruby test.rb #(with indent)

real    0m7.395s
user    0m6.050s
sys     0m0.847s

Solution 2

Of course the fastest solution, and that doesn't build up on memory is just to write the XML manually but that will easily generate other sources of error like possibly invalid XML:

$ time ruby test.rb 

real    0m1.131s
user    0m0.873s
sys     0m0.126s

Here's the code:

f = File.open("foo.xml", "w")
f.puts('<doc>')
500000.times do |k|
  f.puts "<product><widget id=\"#{k}\" name=\"Awesome widget\" /></product>"
end
f.puts('</doc>')
f.close
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文