高效批量更新rails数据库
我正在尝试构建一个 rake 实用程序,它会经常更新我的数据库。
这是我到目前为止的代码:
namespace :utils do
# utils:update_ip
# Downloads the file frim <url> to the temp folder then unzips it in <file_path>
# Then updates the database.
desc "Update ip-to-country database"
task :update_ip => :environment do
require 'open-uri'
require 'zip/zipfilesystem'
require 'csv'
file_name = "ip-to-country.csv"
file_path = "#{RAILS_ROOT}/db/" + file_name
url = 'http://ip-to-country.webhosting.info/downloads/ip-to-country.csv.zip'
#check last time we updated the database.
mod_time = ''
mod_time = File.new(file_path).mtime.httpdate if File.exists? file_path
begin
puts 'Downloading update...'
#send conditional GET to server
zipped_file = open(url, {'If-Modified-Since' => mod_time})
rescue OpenURI::HTTPError => the_error
if the_error.io.status[0] == '304'
puts 'Nothing to update.'
else
puts 'HTTPError: ' + the_error.message
end
else # file was downloaded without error.
Rails.logger.info 'ip-to-coutry: Remote database was last updated: ' + zipped_file.meta['last-modified']
delay = Time.now - zipped_file.last_modified
Rails.logger.info "ip-to-country: Database was outdated for: #{delay} seconds (#{delay / 60 / 60 / 24 } days)"
puts 'Unzipping...'
File.delete(file_path) if File.exists? file_path
Zip::ZipFile.open(zipped_file.path) do |zipfile|
zipfile.extract(file_name, file_path)
end
Iptocs.delete_all
puts "Importing new database..."
# TODO: way, way too heavy find a better solution.
CSV.open(file_path, 'r') do |row|
ip = Iptocs.new( :ip_from => row.shift,
:ip_to => row.shift,
:country_code2 => row.shift,
:country_code3 => row.shift,
:country_name => row.shift)
ip.save
end #CSV
puts "Complete."
end #begin-resuce
end #task
end #namespace
我遇到的问题是,输入 10 万多个条目需要几分钟的时间。我想找到一种更有效的方法来更新我的数据库。理想情况下,这将保持独立于数据库类型,但如果不是这样,我的生产服务器将在 MySQL 上运行。
感谢您的任何见解。
I'm trying to build a rake utility that will update my database every so often.
This is the code I have so far:
namespace :utils do
# utils:update_ip
# Downloads the file frim <url> to the temp folder then unzips it in <file_path>
# Then updates the database.
desc "Update ip-to-country database"
task :update_ip => :environment do
require 'open-uri'
require 'zip/zipfilesystem'
require 'csv'
file_name = "ip-to-country.csv"
file_path = "#{RAILS_ROOT}/db/" + file_name
url = 'http://ip-to-country.webhosting.info/downloads/ip-to-country.csv.zip'
#check last time we updated the database.
mod_time = ''
mod_time = File.new(file_path).mtime.httpdate if File.exists? file_path
begin
puts 'Downloading update...'
#send conditional GET to server
zipped_file = open(url, {'If-Modified-Since' => mod_time})
rescue OpenURI::HTTPError => the_error
if the_error.io.status[0] == '304'
puts 'Nothing to update.'
else
puts 'HTTPError: ' + the_error.message
end
else # file was downloaded without error.
Rails.logger.info 'ip-to-coutry: Remote database was last updated: ' + zipped_file.meta['last-modified']
delay = Time.now - zipped_file.last_modified
Rails.logger.info "ip-to-country: Database was outdated for: #{delay} seconds (#{delay / 60 / 60 / 24 } days)"
puts 'Unzipping...'
File.delete(file_path) if File.exists? file_path
Zip::ZipFile.open(zipped_file.path) do |zipfile|
zipfile.extract(file_name, file_path)
end
Iptocs.delete_all
puts "Importing new database..."
# TODO: way, way too heavy find a better solution.
CSV.open(file_path, 'r') do |row|
ip = Iptocs.new( :ip_from => row.shift,
:ip_to => row.shift,
:country_code2 => row.shift,
:country_code3 => row.shift,
:country_name => row.shift)
ip.save
end #CSV
puts "Complete."
end #begin-resuce
end #task
end #namespace
The problem I'm having is that this takes a few minutes to enter the 100 thousand plus entries. I'd like to find a more efficient way of updating my database. Ideally this will remain independent of the database type, but if not my production server will be running on MySQL.
Thank you for any insight.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
您是否尝试过使用 AR 扩展进行批量导入?当您向数据库插入 1000 行时,您会获得令人印象深刻的性能改进。请访问他们的网站了解更多详细信息。
请参阅这些示例以获取更多信息
使用示例 1
使用示例 2
使用示例 3
Have you tried to use AR Extensions for bulk import? You get impressive performance improvements when you are inserting 1000's of rows to DB. Visit their website for more details.
Refer to these examples for more information
Usage Example 1
Usage Example 2
Usage Example 3
使用数据库级实用程序实现高速卢克!
不幸的是,它们是特定于数据库的。但它们很快
对于 mysql,请参见 http://dev.mysql.com/ doc/refman/5.1/en/load-data.html
Use the database level utilities for high speed Luke!
Unfortunately, they're db specific. But they are fast
For mysql, see http://dev.mysql.com/doc/refman/5.1/en/load-data.html
您可以生成一个包含所需所有插入的文本文件,然后执行:
不确定这是否会更快,但值得一试...
You could generate a text file with all INSERTs you need and then execute:
Not sure if this will be any faster but worth a try...
我目前正在尝试 activerecord-import,这听起来很有希望:
https://github.com/zdennis/ activerecord-导入
I'm currently experimenting with activerecord-import, which sounds very promising:
https://github.com/zdennis/activerecord-import
正如 Larry 所说,如果文件采用您想要的格式,请使用特定于数据库的导入实用程序。但是,如果您需要在插入之前操作数据,则可以生成包含多行数据的单个 INSERT 查询,这比对每行使用单独的查询更快(ActiveRecord 会这样做)。例如:
As Larry says, use your DB-specific import utilities if the file comes in the format you want. However, if you need to manipulate the data before inserting, you can generate a single INSERT query with data for many rows, which is faster than using a separate query for each row (as ActiveRecord will do). For example: