高效批量更新rails数据库

发布于 2024-08-21 13:03:39 字数 2258 浏览 3 评论 0原文

我正在尝试构建一个 rake 实用程序,它会经常更新我的数据库。

这是我到目前为止的代码:

namespace :utils do

  # utils:update_ip
  # Downloads the file frim <url> to the temp folder then unzips it in <file_path>
  # Then updates the database.

  desc "Update ip-to-country database"
  task :update_ip => :environment do

    require 'open-uri'
    require 'zip/zipfilesystem'
    require 'csv'

    file_name = "ip-to-country.csv"
    file_path = "#{RAILS_ROOT}/db/" + file_name
    url = 'http://ip-to-country.webhosting.info/downloads/ip-to-country.csv.zip'


    #check last time we updated the database.
    mod_time = ''
    mod_time = File.new(file_path).mtime.httpdate    if File.exists? file_path

    begin
      puts 'Downloading update...'
      #send conditional GET to server
      zipped_file = open(url, {'If-Modified-Since' => mod_time})
    rescue OpenURI::HTTPError => the_error
      if the_error.io.status[0] == '304'
        puts 'Nothing to update.'
      else
        puts 'HTTPError: ' + the_error.message
      end
    else # file was downloaded without error.

      Rails.logger.info 'ip-to-coutry: Remote database was last updated: ' + zipped_file.meta['last-modified']
      delay = Time.now - zipped_file.last_modified
      Rails.logger.info "ip-to-country: Database was outdated for: #{delay} seconds (#{delay / 60 / 60 / 24 } days)"

      puts 'Unzipping...'
      File.delete(file_path) if File.exists? file_path
      Zip::ZipFile.open(zipped_file.path) do |zipfile|
        zipfile.extract(file_name, file_path)
      end

      Iptocs.delete_all

      puts "Importing new database..."


      # TODO: way, way too heavy find a better solution.


      CSV.open(file_path, 'r') do |row|
        ip = Iptocs.new(  :ip_from        => row.shift,
                        :ip_to          => row.shift,
                        :country_code2  => row.shift,
                        :country_code3  => row.shift,
                        :country_name   => row.shift)
        ip.save
      end #CSV
      puts "Complete."

    end #begin-resuce
  end #task
end #namespace

我遇到的问题是,输入 10 万多个条目需要几分钟的时间。我想找到一种更有效的方法来更新我的数据库。理想情况下,这将保持独立于数据库类型,但如果不是这样,我的生产服务器将在 MySQL 上运行。

感谢您的任何见解。

I'm trying to build a rake utility that will update my database every so often.

This is the code I have so far:

namespace :utils do

  # utils:update_ip
  # Downloads the file frim <url> to the temp folder then unzips it in <file_path>
  # Then updates the database.

  desc "Update ip-to-country database"
  task :update_ip => :environment do

    require 'open-uri'
    require 'zip/zipfilesystem'
    require 'csv'

    file_name = "ip-to-country.csv"
    file_path = "#{RAILS_ROOT}/db/" + file_name
    url = 'http://ip-to-country.webhosting.info/downloads/ip-to-country.csv.zip'


    #check last time we updated the database.
    mod_time = ''
    mod_time = File.new(file_path).mtime.httpdate    if File.exists? file_path

    begin
      puts 'Downloading update...'
      #send conditional GET to server
      zipped_file = open(url, {'If-Modified-Since' => mod_time})
    rescue OpenURI::HTTPError => the_error
      if the_error.io.status[0] == '304'
        puts 'Nothing to update.'
      else
        puts 'HTTPError: ' + the_error.message
      end
    else # file was downloaded without error.

      Rails.logger.info 'ip-to-coutry: Remote database was last updated: ' + zipped_file.meta['last-modified']
      delay = Time.now - zipped_file.last_modified
      Rails.logger.info "ip-to-country: Database was outdated for: #{delay} seconds (#{delay / 60 / 60 / 24 } days)"

      puts 'Unzipping...'
      File.delete(file_path) if File.exists? file_path
      Zip::ZipFile.open(zipped_file.path) do |zipfile|
        zipfile.extract(file_name, file_path)
      end

      Iptocs.delete_all

      puts "Importing new database..."


      # TODO: way, way too heavy find a better solution.


      CSV.open(file_path, 'r') do |row|
        ip = Iptocs.new(  :ip_from        => row.shift,
                        :ip_to          => row.shift,
                        :country_code2  => row.shift,
                        :country_code3  => row.shift,
                        :country_name   => row.shift)
        ip.save
      end #CSV
      puts "Complete."

    end #begin-resuce
  end #task
end #namespace

The problem I'm having is that this takes a few minutes to enter the 100 thousand plus entries. I'd like to find a more efficient way of updating my database. Ideally this will remain independent of the database type, but if not my production server will be running on MySQL.

Thank you for any insight.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

埖埖迣鎅 2024-08-28 13:03:39

您是否尝试过使用 AR 扩展进行批量导入?当您向数据库插入 1000 行时,您会获得令人印象深刻的性能改进。请访问他们的网站了解更多详细信息。

请参阅这些示例以获取更多信息

使用示例 1

使用示例 2

使用示例 3

Have you tried to use AR Extensions for bulk import? You get impressive performance improvements when you are inserting 1000's of rows to DB. Visit their website for more details.

Refer to these examples for more information

Usage Example 1

Usage Example 2

Usage Example 3

软糯酥胸 2024-08-28 13:03:39

使用数据库级实用程序实现高速卢克!

不幸的是,它们是特定于数据库的。但它们很快
对于 mysql,请参见 http://dev.mysql.com/ doc/refman/5.1/en/load-data.html

Use the database level utilities for high speed Luke!

Unfortunately, they're db specific. But they are fast
For mysql, see http://dev.mysql.com/doc/refman/5.1/en/load-data.html

泪痕残 2024-08-28 13:03:39

您可以生成一个包含所需所有插入的文本文件,然后执行:

mysql -u user -p db_name < mytextfile.txt

不确定这是否会更快,但值得一试...

You could generate a text file with all INSERTs you need and then execute:

mysql -u user -p db_name < mytextfile.txt

Not sure if this will be any faster but worth a try...

豆芽 2024-08-28 13:03:39

我目前正在尝试 activerecord-import,这听起来很有希望:

https://github.com/zdennis/ activerecord-导入

I'm currently experimenting with activerecord-import, which sounds very promising:

https://github.com/zdennis/activerecord-import

烟织青萝梦 2024-08-28 13:03:39

正如 Larry 所说,如果文件采用您想要的格式,请使用特定于数据库的导入实用程序。但是,如果您需要在插入之前操作数据,则可以生成包含多行数据的单个 INSERT 查询,这比对每行使用单独的查询更快(ActiveRecord 会这样做)。例如:

INSERT INTO iptocs (ip_from, ip_to, country_code) VALUES
  ('xxx', 'xxx', 'xxx'),
  ('yyy', 'yyy', 'yyy'),
  ...;

As Larry says, use your DB-specific import utilities if the file comes in the format you want. However, if you need to manipulate the data before inserting, you can generate a single INSERT query with data for many rows, which is faster than using a separate query for each row (as ActiveRecord will do). For example:

INSERT INTO iptocs (ip_from, ip_to, country_code) VALUES
  ('xxx', 'xxx', 'xxx'),
  ('yyy', 'yyy', 'yyy'),
  ...;
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文