将大量数据集导入 Rails 应用程序的最智能方法是什么?

发布于 2024-08-25 07:13:03 字数 213 浏览 3 评论 0原文

我有多个海量(数千兆字节)数据集,需要导入到 Rails 应用程序中。目前,每个数据集都位于我的开发计算机上各自的数据库中,我需要从中读取数据并根据它们包含的信息在 Rails 数据库的表中创建行。我的 Rails 数据库中的表不会与源数据库中的表完全相同。

解决这个问题最明智的方法是什么?

我正在考虑迁移,但我不太确定如何将迁移连接到数据库,即使这是可能的,那是否会慢得离谱?

I've got multiple massive (multi gigabyte) datasets I need to import into a Rails app. The datasets are currently each in their own database on my development machine, and I need to read from them and create rows in tables in my Rails database based on the information they contain. The tables in my Rails database will not be exactly the same as the tables in the source databases.

What's the smartest way to go about this?

I was thinking migrations, but I'm not exactly sure how to connect the migration to the databases, and even if that is possible, is that going to be ridiculously slow?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

海之角 2024-09-01 07:13:04

既然georgian建议了它,我将发布我的评论作为答案:

如果更改是肤浅的(列名称更改、列删除等),那么我只需手动将它们从旧数据库导出到新数据库中,然后运行更改列的迁移。

Since georgian suggested it, I'll post my comment as an answer:

If the changes are superficial (column names changed, columns removed, etc), then I would just manually export them from the old database and into the new, and then run a migration to change the columns.

戴着白色围巾的女孩 2024-09-01 07:13:03

在没有看到架构或不知道要应用于每一行的逻辑的情况下,我想说导入此数据的最快方法是创建要按所需列顺序导出的表的视图(并使用 sql 处理它) )并在该视图上选择到输出文件中。然后,您可以获取生成的文件并将其导入到目标数据库中。

不过,这将不允许您对导入的数据使用任何 Rails 模型验证。

否则,您必须慢慢地为每个源数据库/表创建一个模型来提取数据(http: //programmerassist.com/article/302 告诉您如何连接到给定模型的不同数据库)并以这种方式导入它。这将非常慢,但您可以设置一个 EC2 怪物实例并尽可能快地运行它。

迁移可以解决这个问题,但我不会推荐它用于这样的事情。

without seeing the schemas or knowing the logic you want to apply to each row, I would say the fastest way to import this data is to create a view of the table you want to export in the column order you want (and process it using sql) and the do a select into outfile on that view. You can then take the resulting file and import it into the target db.

This will not allow you to use any rails model validations on the imported data, though.

Otherwise, you have to go the slow way and create a model for each source db/table to extract the data (http://programmerassist.com/article/302 tells you how to connect to a different db for a given model) and import it that way. This is going to be quite slow, but you could set up an EC2 monster instance and run it as fast as possible.

Migrations would work for this, but I wouldn't recommend it for something like this.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文