Ruby on Rails - 从 CSV 文件导入数据

发布于 2024-10-06 17:22:03 字数 526 浏览 5 评论 0 原文

我想将 CSV 文件中的数据导入到现有数据库表中。我不想保存 CSV 文件,只需从中获取数据并将其放入现有表中。我正在使用 Ruby 1.9.2 和 Rails 3。

这是我的表格:

create_table "mouldings", :force => true do |t|
  t.string   "suppliers_code"
  t.datetime "created_at"
  t.datetime "updated_at"
  t.string   "name"
  t.integer  "supplier_id"
  t.decimal  "length",         :precision => 3, :scale => 2
  t.decimal  "cost",           :precision => 4, :scale => 2
  t.integer  "width"
  t.integer  "depth"
end

您能给我一些代码来告诉我执行此操作的最佳方法吗,谢谢。

I would like to import data from a CSV file into an existing database table. I do not want to save the CSV file, just take the data from it and put it into the existing table. I am using Ruby 1.9.2 and Rails 3.

This is my table:

create_table "mouldings", :force => true do |t|
  t.string   "suppliers_code"
  t.datetime "created_at"
  t.datetime "updated_at"
  t.string   "name"
  t.integer  "supplier_id"
  t.decimal  "length",         :precision => 3, :scale => 2
  t.decimal  "cost",           :precision => 4, :scale => 2
  t.integer  "width"
  t.integer  "depth"
end

Can you give me some code to show me the best way to do this, thanks.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(12

我是有多爱你 2024-10-13 17:22:03
require 'csv'    

csv_text = File.read('...')
csv = CSV.parse(csv_text, :headers => true)
csv.each do |row|
  Moulding.create!(row.to_hash)
end
require 'csv'    

csv_text = File.read('...')
csv = CSV.parse(csv_text, :headers => true)
csv.each do |row|
  Moulding.create!(row.to_hash)
end
百变从容 2024-10-13 17:22:03

yfeldblum 答案的更简单版本,更简单,也适用于大文件:

require 'csv'    

CSV.foreach(filename, headers: true) do |row|
  Moulding.create!(row.to_hash)
end

不需要 with_in Different_accesssymbolize_keys,也不需要先在文件中读入字符串。

它不会立即将整个文件保存在内存中,而是逐行读取并每行创建一个 Molding。

Simpler version of yfeldblum's answer, that is simpler and works well also with large files:

require 'csv'    

CSV.foreach(filename, headers: true) do |row|
  Moulding.create!(row.to_hash)
end

No need for with_indifferent_access or symbolize_keys, and no need to read in the file to a string first.

It doesnt't keep the whole file in memory at once, but reads in line by line and creates a Moulding per line.

熊抱啵儿 2024-10-13 17:22:03

smarter_csv gem 是专门为此用例创建的:从 CSV 文件读取数据并快速创建数据库条目。

  require 'smarter_csv'
  options = {}
  SmarterCSV.process('input_file.csv', options) do |chunk|
    chunk.each do |data_hash|
      Moulding.create!( data_hash )
    end
  end

您可以使用选项chunk_size一次读取N个csv行,然后在内部循环中使用Resque来生成将创建新记录的作业,而不是立即创建它们 - 这样您可以将生成条目的负载分散给多个工作人员。

参见:
https://github.com/tilo/smarter_csv

The smarter_csv gem was specifically created for this use-case: to read data from CSV file and quickly create database entries.

  require 'smarter_csv'
  options = {}
  SmarterCSV.process('input_file.csv', options) do |chunk|
    chunk.each do |data_hash|
      Moulding.create!( data_hash )
    end
  end

You can use the option chunk_size to read N csv-rows at a time, and then use Resque in the inner loop to generate jobs which will create the new records, rather than creating them right away - this way you can spread the load of generating entries to multiple workers.

See also:
https://github.com/tilo/smarter_csv

骄兵必败 2024-10-13 17:22:03

您可以尝试 Upsert

require 'upsert' # add this to your Gemfile
require 'csv'    

u = Upsert.new Moulding.connection, Moulding.table_name
CSV.foreach(file, headers: true) do |row|
  selector = { name: row['name'] } # this treats "name" as the primary key and prevents the creation of duplicates by name
  setter = row.to_hash
  u.row selector, setter
end

如果这是您想要的,您也可以考虑从表中删除自动增量主键并将主键设置为name。或者,如果存在形成主键的某种属性组合,请将其用作选择器。不需要索引,它只会使速度更快。

You might try Upsert:

require 'upsert' # add this to your Gemfile
require 'csv'    

u = Upsert.new Moulding.connection, Moulding.table_name
CSV.foreach(file, headers: true) do |row|
  selector = { name: row['name'] } # this treats "name" as the primary key and prevents the creation of duplicates by name
  setter = row.to_hash
  u.row selector, setter
end

If this is what you want, you might also consider getting rid of the auto-increment primary key from the table and setting the primary key to name. Alternatively, if there is some combination of attributes that form a primary key, use that as the selector. No index is necessary, it will just make it faster.

两人的回忆 2024-10-13 17:22:03

这会有所帮助。它也有代码示例:

http://csv-mapper.rubyforge.org/

或者用于 rake执行相同操作的任务:

http://erikonrails.snowedin.net/?p=212

This can help. It has code examples too:

http://csv-mapper.rubyforge.org/

Or for a rake task for doing the same:

http://erikonrails.snowedin.net/?p=212

迟到的我 2024-10-13 17:22:03

最好将数据库相关过程包装在 transaction 块内。代码片段打击是将一组语言播种到语言模型的完整过程,

require 'csv'

namespace :lan do
  desc 'Seed initial languages data with language & code'
  task init_data: :environment do
    puts '>>> Initializing Languages Data Table'
    ActiveRecord::Base.transaction do
      csv_path = File.expand_path('languages.csv', File.dirname(__FILE__))
      csv_str = File.read(csv_path)
      csv = CSV.new(csv_str).to_a
      csv.each do |lan_set|
        lan_code = lan_set[0]
        lan_str = lan_set[1]
        Language.create!(language: lan_str, code: lan_code)
        print '.'
      end
    end
    puts ''
    puts '>>> Languages Database Table Initialization Completed'
  end
end

下面的片段是 languages.csv 文件的一部分,

aa,Afar
ab,Abkhazian
af,Afrikaans
ak,Akan
am,Amharic
ar,Arabic
as,Assamese
ay,Aymara
az,Azerbaijani
ba,Bashkir
...

It is better to wrap the database related process inside a transaction block. Code snippet blow is a full process of seeding a set of languages to Language model,

require 'csv'

namespace :lan do
  desc 'Seed initial languages data with language & code'
  task init_data: :environment do
    puts '>>> Initializing Languages Data Table'
    ActiveRecord::Base.transaction do
      csv_path = File.expand_path('languages.csv', File.dirname(__FILE__))
      csv_str = File.read(csv_path)
      csv = CSV.new(csv_str).to_a
      csv.each do |lan_set|
        lan_code = lan_set[0]
        lan_str = lan_set[1]
        Language.create!(language: lan_str, code: lan_code)
        print '.'
      end
    end
    puts ''
    puts '>>> Languages Database Table Initialization Completed'
  end
end

Snippet below is a partial of languages.csv file,

aa,Afar
ab,Abkhazian
af,Afrikaans
ak,Akan
am,Amharic
ar,Arabic
as,Assamese
ay,Aymara
az,Azerbaijani
ba,Bashkir
...
浪漫之都 2024-10-13 17:22:03

更好的方法是将其包含在 rake 任务中。在 /lib/tasks/ 内创建 import.rake 文件并将此代码放入该文件中。

desc "Imports a CSV file into an ActiveRecord table"
task :csv_model_import, [:filename, :model] => [:environment] do |task,args|
  lines = File.new(args[:filename], "r:ISO-8859-1").readlines
  header = lines.shift.strip
  keys = header.split(',')
  lines.each do |line|
    values = line.strip.split(',')
    attributes = Hash[keys.zip values]
    Module.const_get(args[:model]).create(attributes)
  end
end

之后在终端中运行此命令rake csv_model_import[file.csv,Name_of_the_Model]

The better way is to include it in a rake task. Create import.rake file inside /lib/tasks/ and put this code to that file.

desc "Imports a CSV file into an ActiveRecord table"
task :csv_model_import, [:filename, :model] => [:environment] do |task,args|
  lines = File.new(args[:filename], "r:ISO-8859-1").readlines
  header = lines.shift.strip
  keys = header.split(',')
  lines.each do |line|
    values = line.strip.split(',')
    attributes = Hash[keys.zip values]
    Module.const_get(args[:model]).create(attributes)
  end
end

After that run this command in your terminal rake csv_model_import[file.csv,Name_of_the_Model]

请叫√我孤独 2024-10-13 17:22:03

我知道这是一个老问题,但它仍然在谷歌的前 10 个链接中。

逐一保存行的效率不是很高,因为它会导致循环中的数据库调用,最好避免这种情况,特别是当您需要插入大量数据时。

使用批量插入更好(而且速度更快)。

INSERT INTO `mouldings` (suppliers_code, name, cost)
VALUES
    ('s1', 'supplier1', 1.111), 
    ('s2', 'supplier2', '2.222')

您可以手动构建这样的查询,然后执行 Model.connection.execute(RAW SQL STRING) (不推荐)
或者使用 gem activerecord-import (它于 2010 年 8 月 11 日首次发布),在这种情况下只需将数据放入数组 rows 中并调用 Model.import rows code>

详细信息请参阅 gem 文档

I know it's old question but it still in first 10 links in google.

It is not very efficient to save rows one-by-one because it cause database call in the loop and you better avoid that, especially when you need to insert huge portions of data.

It's better (and significantly faster) to use batch insert.

INSERT INTO `mouldings` (suppliers_code, name, cost)
VALUES
    ('s1', 'supplier1', 1.111), 
    ('s2', 'supplier2', '2.222')

You can build such a query manually and than do Model.connection.execute(RAW SQL STRING) (not recomended)
or use gem activerecord-import (it was first released on 11 Aug 2010) in this case just put data in array rows and call Model.import rows

refer to gem docs for details

帅气称霸 2024-10-13 17:22:03

使用这个宝石:
https://rubygems.org/gems/active_record_importer

class Moulding < ActiveRecord::Base
  acts_as_importable
end

然后您现在可以使用:

Moulding.import!(file: File.open(PATH_TO_FILE))

只需确保您的标题与表的列名称匹配

Use this gem:
https://rubygems.org/gems/active_record_importer

class Moulding < ActiveRecord::Base
  acts_as_importable
end

Then you may now use:

Moulding.import!(file: File.open(PATH_TO_FILE))

Just be sure to that your headers match the column names of your table

最终幸福 2024-10-13 17:22:03

以下模块可以在任何模型上扩展,它将根据 CSV 中定义的列标题导入数据。

注意:

  • 这是一个很棒的内部工具,对于客户使用,我建议添加保护措施和清理
  • CSV 中的列名称必须与数据库模式完全相同,否则它将无法工作
  • 可以进一步改进使用表名称获取标题与在文件中定义标题

models/concerns 文件夹中创建名为 "csv_importer.rb" 的文件

module CsvImporter
  extend ActiveSupport::Concern  
  require 'csv'
  
  def convert_csv_to_book_attributes(csv_path)
    csv_rows = CSV.open(csv_path).each.to_a.compact
    columns = csv_rows[0].map(&:strip).map(&:to_sym)
    csv_rows.shift
    
    return columns, csv_rows
  end
  
  def import_by_csv(csv_path)
    columns, attributes_array = convert_csv_to_book_attributes(csv_path)
    
    message = ""
    begin
      self.import columns, attributes_array, validate: false
      message = "Import Successful."
    rescue => e
      message = e.message
    end
    
    return message
  end
end

添加 extend CsvImporter 到您想要将此功能扩展到的任何模型。

在您的控制器中,您可以执行如下操作来利用此功能:

def import_file
   model_name = params[:table_name].singularize.camelize.constantize
   csv = params[:file].path
   @message = model_name.import_by_csv(csv)
end

The following module can be extended on any model and it will import the data according to the column headers defined in the CSV.

Note:

  • This is a great internal tool, for customer use I would recommend adding safeguards and sanitization
  • The column names in the CSV must be exactly like the DB schema or it won't work
  • It can be further improved by using the table name to get the headers vs defining them in the file

Create a file named "csv_importer.rb" in your models/concerns folder

module CsvImporter
  extend ActiveSupport::Concern  
  require 'csv'
  
  def convert_csv_to_book_attributes(csv_path)
    csv_rows = CSV.open(csv_path).each.to_a.compact
    columns = csv_rows[0].map(&:strip).map(&:to_sym)
    csv_rows.shift
    
    return columns, csv_rows
  end
  
  def import_by_csv(csv_path)
    columns, attributes_array = convert_csv_to_book_attributes(csv_path)
    
    message = ""
    begin
      self.import columns, attributes_array, validate: false
      message = "Import Successful."
    rescue => e
      message = e.message
    end
    
    return message
  end
end

Add extend CsvImporter to whichever model you would like to extend this functionality to.

In your controller you can have an action like the following to utilize this functionality:

def import_file
   model_name = params[:table_name].singularize.camelize.constantize
   csv = params[:file].path
   @message = model_name.import_by_csv(csv)
end
不离久伴 2024-10-13 17:22:03

最好使用 CSV::Table 并使用 String.encode(universal_newline: true)。它将 CRLF 和 CR 转换为 LF

It's better to use CSV::Table and use String.encode(universal_newline: true). It converting CRLF and CR to LF

铃予 2024-10-13 17:22:03

如果您想使用 SmartCSV

all_data = SmarterCSV.process(
             params[:file].tempfile, 
             { 
               :col_sep => "\t", 
               :row_sep => "\n" 
             }
           )

这表示每行 "\t" 中制表符分隔的数据,其中行由换行符分隔 "\n"

If you want to Use SmartCSV

all_data = SmarterCSV.process(
             params[:file].tempfile, 
             { 
               :col_sep => "\t", 
               :row_sep => "\n" 
             }
           )

This represents tab delimited data in each row "\t" with rows separated by new lines "\n"

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文