当前位置：文江博客话题详情

定期从 Heroku 上的文件导入数据

发布于 2024-11-11 17:29:44 字数 254 浏览 4 评论 0原文

我需要定期将一些数据导入 Heroku 上的 Rails 应用程序。

要执行的任务分为以下部分： * 从网站下载一个大的 zip 文件（例如 ~100mb） * 解压文件（解压后的空间约为1.50GB） * 运行一个 rake 脚本来读取这些文件并使用我的活动记录模型创建或更新记录 * cleanup

我怎样才能在heroku上做到这一点？使用一些外部存储（例如S3）是否更好？你会如何处理这样的事情？

理想情况下，这需要每晚运行。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

扎心 2024-11-18 17:29:44

几天前我尝试过完全相同的事情，我得出的结论是这是无法完成的，因为 Heroku 对每个进程施加了内存限制。（我用从互联网上读取的文件构建了一个数据结构，并尝试推送到数据库）

我使用了一个 rake 任务，该任务将拉取并解析几个大文件，然后填充数据库。

作为解决方法，我现在在本地计算机中运行此 rake 任务，并将数据库推送到 S3，并从本地计算机发出 heroku 命令来恢复 heroku 数据库实例。

"heroku pgbackups:restore 'http://s3.amazonaws.com/#{yourfilepath}' --app  #{APP_NAME} --confirm #{APP_NAME}"

您可以使用雾库推送到 S3

require 'rubygems'
require 'fog'
connection = Fog::Storage.new(
    :provider              => 'AWS',
    :aws_secret_access_key => "#{YOUR_SECRECT}",
    :aws_access_key_id     => "#{YOUR_ACCESS_KEY}"
)

directory = connection.directories.get("#{YOUR_BACKUP_DIRECTORY}")

# upload the file
file = directory.files.create(
    :key    => '#{REMOTE_FILE_NAME}',
    :body   => File.open("#{LOCAL_BACKUP_FILE_PATH}"),
    :public => true
)

我用来在本地计算机上进行 pgbackup 的命令是

system "PGPASSWORD=#{YOUR_DB_PASSWORD} pg_dump -Fc --no-acl --no-owner -h localhost -U #{YOUR_DB_USER_NAME} #{YOUR_DB_DATABSE_NAME} > #{LOCAL_BACKUP_FILE_PATH}"

我放置了一个 rake 任务来自动执行所有这些步骤。

之后您可能会尝试使用worker(DelayedJob)。我想你可以将你的工作人员配置为每 24 小时运行一次。我认为工人没有30秒的限制。但我不确定内存使用情况。

I have tried exact same thing couple of days back and the conclusion that I came up with was this can't be done because of memory limit restrictions that heroku imposes on each process. (I build a data structure with the files that I read from the internet and try to push to DB)

I was using a rake task that would pull and parse couple of big file and then populate the database.

As a work around I run this rake task in my local machine now and push the database to S3 and issue a heroku command from my local machine to restore the heroku DB instance.

"heroku pgbackups:restore 'http://s3.amazonaws.com/#{yourfilepath}' --app  #{APP_NAME} --confirm #{APP_NAME}"

You could push to S3 using fog library

require 'rubygems'
require 'fog'
connection = Fog::Storage.new(
    :provider              => 'AWS',
    :aws_secret_access_key => "#{YOUR_SECRECT}",
    :aws_access_key_id     => "#{YOUR_ACCESS_KEY}"
)

directory = connection.directories.get("#{YOUR_BACKUP_DIRECTORY}")

# upload the file
file = directory.files.create(
    :key    => '#{REMOTE_FILE_NAME}',
    :body   => File.open("#{LOCAL_BACKUP_FILE_PATH}"),
    :public => true
)

The command that I use to make a pgbackup on my local machine is

system "PGPASSWORD=#{YOUR_DB_PASSWORD} pg_dump -Fc --no-acl --no-owner -h localhost -U #{YOUR_DB_USER_NAME} #{YOUR_DB_DATABSE_NAME} > #{LOCAL_BACKUP_FILE_PATH}"

I have put a rake task that automates all these steps.

After thing your might try is use worker(DelayedJob). I guess you can configure your workers to run every 24 hours. I think workers don't have the restriction of 30 seconds limit. But I am not sure about the memory usage.

回复收藏 0 原文

~没有更多了~