高效批量更新rails数据库

发布于 2024-08-21 13:03:39 字数 2258 浏览 3 评论 0原文

我正在尝试构建一个 rake 实用程序，它会经常更新我的数据库。

这是我到目前为止的代码：

namespace :utils do

  # utils:update_ip
  # Downloads the file frim <url> to the temp folder then unzips it in <file_path>
  # Then updates the database.

  desc "Update ip-to-country database"
  task :update_ip => :environment do

    require 'open-uri'
    require 'zip/zipfilesystem'
    require 'csv'

    file_name = "ip-to-country.csv"
    file_path = "#{RAILS_ROOT}/db/" + file_name
    url = 'http://ip-to-country.webhosting.info/downloads/ip-to-country.csv.zip'


    #check last time we updated the database.
    mod_time = ''
    mod_time = File.new(file_path).mtime.httpdate    if File.exists? file_path

    begin
      puts 'Downloading update...'
      #send conditional GET to server
      zipped_file = open(url, {'If-Modified-Since' => mod_time})
    rescue OpenURI::HTTPError => the_error
      if the_error.io.status[0] == '304'
        puts 'Nothing to update.'
      else
        puts 'HTTPError: ' + the_error.message
      end
    else # file was downloaded without error.

      Rails.logger.info 'ip-to-coutry: Remote database was last updated: ' + zipped_file.meta['last-modified']
      delay = Time.now - zipped_file.last_modified
      Rails.logger.info "ip-to-country: Database was outdated for: #{delay} seconds (#{delay / 60 / 60 / 24 } days)"

      puts 'Unzipping...'
      File.delete(file_path) if File.exists? file_path
      Zip::ZipFile.open(zipped_file.path) do |zipfile|
        zipfile.extract(file_name, file_path)
      end

      Iptocs.delete_all

      puts "Importing new database..."


      # TODO: way, way too heavy find a better solution.


      CSV.open(file_path, 'r') do |row|
        ip = Iptocs.new(  :ip_from        => row.shift,
                        :ip_to          => row.shift,
                        :country_code2  => row.shift,
                        :country_code3  => row.shift,
                        :country_name   => row.shift)
        ip.save
      end #CSV
      puts "Complete."

    end #begin-resuce
  end #task
end #namespace

我遇到的问题是，输入 10 万多个条目需要几分钟的时间。我想找到一种更有效的方法来更新我的数据库。理想情况下，这将保持独立于数据库类型，但如果不是这样，我的生产服务器将在 MySQL 上运行。

感谢您的任何见解。

原文

I'm trying to build a rake utility that will update my database every so often.

This is the code I have so far:

namespace :utils do

  # utils:update_ip
  # Downloads the file frim <url> to the temp folder then unzips it in <file_path>
  # Then updates the database.

  desc "Update ip-to-country database"
  task :update_ip => :environment do

    require 'open-uri'
    require 'zip/zipfilesystem'
    require 'csv'

    file_name = "ip-to-country.csv"
    file_path = "#{RAILS_ROOT}/db/" + file_name
    url = 'http://ip-to-country.webhosting.info/downloads/ip-to-country.csv.zip'


    #check last time we updated the database.
    mod_time = ''
    mod_time = File.new(file_path).mtime.httpdate    if File.exists? file_path

    begin
      puts 'Downloading update...'
      #send conditional GET to server
      zipped_file = open(url, {'If-Modified-Since' => mod_time})
    rescue OpenURI::HTTPError => the_error
      if the_error.io.status[0] == '304'
        puts 'Nothing to update.'
      else
        puts 'HTTPError: ' + the_error.message
      end
    else # file was downloaded without error.

      Rails.logger.info 'ip-to-coutry: Remote database was last updated: ' + zipped_file.meta['last-modified']
      delay = Time.now - zipped_file.last_modified
      Rails.logger.info "ip-to-country: Database was outdated for: #{delay} seconds (#{delay / 60 / 60 / 24 } days)"

      puts 'Unzipping...'
      File.delete(file_path) if File.exists? file_path
      Zip::ZipFile.open(zipped_file.path) do |zipfile|
        zipfile.extract(file_name, file_path)
      end

      Iptocs.delete_all

      puts "Importing new database..."


      # TODO: way, way too heavy find a better solution.


      CSV.open(file_path, 'r') do |row|
        ip = Iptocs.new(  :ip_from        => row.shift,
                        :ip_to          => row.shift,
                        :country_code2  => row.shift,
                        :country_code3  => row.shift,
                        :country_name   => row.shift)
        ip.save
      end #CSV
      puts "Complete."

    end #begin-resuce
  end #task
end #namespace

The problem I'm having is that this takes a few minutes to enter the 100 thousand plus entries. I'd like to find a more efficient way of updating my database. Ideally this will remain independent of the database type, but if not my production server will be running on MySQL.

Thank you for any insight.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

埖埖迣鎅 2024-08-28 13:03:39

您是否尝试过使用 AR 扩展进行批量导入？当您向数据库插入 1000 行时，您会获得令人印象深刻的性能改进。请访问他们的网站了解更多详细信息。

请参阅这些示例以获取更多信息

软糯酥胸 2024-08-28 13:03:39

使用数据库级实用程序实现高速卢克！

不幸的是，它们是特定于数据库的。但它们很快
对于 mysql，请参见 http://dev.mysql.com/ doc/refman/5.1/en/load-data.html

回复收藏 0 原文

泪痕残 2024-08-28 13:03:39

您可以生成一个包含所需所有插入的文本文件，然后执行：

mysql -u user -p db_name < mytextfile.txt

不确定这是否会更快，但值得一试...

You could generate a text file with all INSERTs you need and then execute:

mysql -u user -p db_name < mytextfile.txt

Not sure if this will be any faster but worth a try...

回复收藏 0 原文

豆芽 2024-08-28 13:03:39

我目前正在尝试 activerecord-import，这听起来很有希望：

https://github.com/zdennis/ activerecord-导入

回复收藏 0 原文

烟织青萝梦 2024-08-28 13:03:39

正如 Larry 所说，如果文件采用您想要的格式，请使用特定于数据库的导入实用程序。但是，如果您需要在插入之前操作数据，则可以生成包含多行数据的单个 INSERT 查询，这比对每行使用单独的查询更快（ActiveRecord 会这样做）。例如：

INSERT INTO iptocs (ip_from, ip_to, country_code) VALUES
  ('xxx', 'xxx', 'xxx'),
  ('yyy', 'yyy', 'yyy'),
  ...;

As Larry says, use your DB-specific import utilities if the file comes in the format you want. However, if you need to manipulate the data before inserting, you can generate a single INSERT query with data for many rows, which is faster than using a separate query for each row (as ActiveRecord will do). For example:

INSERT INTO iptocs (ip_from, ip_to, country_code) VALUES
  ('xxx', 'xxx', 'xxx'),
  ('yyy', 'yyy', 'yyy'),
  ...;

回复收藏 0 原文

~没有更多了~

关于作者

羅雙樹

暂无简介

文章

27 人气

关注发私信

友情链接

文江博客

高效批量更新rails数据库

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（5）

关于作者

相关话题

热门标签

推荐作者

峩卟喜欢

一向肩并

潜伏

dongyinghao

百变从容

沧笙踏歌

友情链接

高效批量更新rails数据库

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（5）

关于作者

相关话题

热门标签

推荐作者

峩卟喜欢

一向肩并

潜伏

dongyinghao

百变从容

沧笙踏歌

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。