定期从 Heroku 上的文件导入数据

发布于 2024-11-11 17:29:44 字数 254 浏览 2 评论 0原文

我需要定期将一些数据导入 Heroku 上的 Rails 应用程序。

要执行的任务分为以下部分: * 从网站下载一个大的 zip 文件(例如 ~100mb) * 解压文件(解压后的空间约为1.50GB) * 运行一个 rake 脚本来读取这些文件并使用我的活动记录模型创建或更新记录 * cleanup

我怎样才能在heroku上做到这一点?使用一些外部存储(例如S3)是否更好? 你会如何处理这样的事情?

理想情况下,这需要每晚运行。

I need to periodically import some data into my rails app on Heroku.

The task to execute is split into the following parts:
* download a big zip file (e.g. ~100mb) from a website
* unzip the file (unzipped space is ~1.50gb)
* run a rake script that reads those file and create or update records using my active record models
* cleanup

How can I do this on heroku? Is it better to use some external storage (e.g. S3).
How would you approach such a thing?

Ideally this needs to run every night.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

扎心 2024-11-18 17:29:44

几天前我尝试过完全相同的事情,我得出的结论是这是无法完成的,因为 Heroku 对每个进程施加了内存限制。 (我用从互联网上读取的文件构建了一个数据结构,并尝试推送到数据库)

我使用了一个 rake 任务,该任务将拉取并解析几个大文件,然后填充数据库。

作为解决方法,我现在在本地计算机中运行此 rake 任务,并将数据库推送到 S3,并从本地计算机发出 heroku 命令来恢复 heroku 数据库实例。

"heroku pgbackups:restore 'http://s3.amazonaws.com/#{yourfilepath}' --app  #{APP_NAME} --confirm #{APP_NAME}"

您可以使用雾库推送到 S3

require 'rubygems'
require 'fog'
connection = Fog::Storage.new(
    :provider              => 'AWS',
    :aws_secret_access_key => "#{YOUR_SECRECT}",
    :aws_access_key_id     => "#{YOUR_ACCESS_KEY}"
)

directory = connection.directories.get("#{YOUR_BACKUP_DIRECTORY}")

# upload the file
file = directory.files.create(
    :key    => '#{REMOTE_FILE_NAME}',
    :body   => File.open("#{LOCAL_BACKUP_FILE_PATH}"),
    :public => true
)

我用来在本地计算机上进行 pgbackup 的命令是

system "PGPASSWORD=#{YOUR_DB_PASSWORD} pg_dump -Fc --no-acl --no-owner -h localhost -U #{YOUR_DB_USER_NAME} #{YOUR_DB_DATABSE_NAME} > #{LOCAL_BACKUP_FILE_PATH}"

我放置了一个 rake 任务来自动执行所有这些步骤。

之后您可能会尝试使用worker(DelayedJob)。我想你可以将你的工作人员配置为每 24 小时运行一次。我认为工人没有30秒的限制。但我不确定内存使用情况。

I have tried exact same thing couple of days back and the conclusion that I came up with was this can't be done because of memory limit restrictions that heroku imposes on each process. (I build a data structure with the files that I read from the internet and try to push to DB)

I was using a rake task that would pull and parse couple of big file and then populate the database.

As a work around I run this rake task in my local machine now and push the database to S3 and issue a heroku command from my local machine to restore the heroku DB instance.

"heroku pgbackups:restore 'http://s3.amazonaws.com/#{yourfilepath}' --app  #{APP_NAME} --confirm #{APP_NAME}"

You could push to S3 using fog library

require 'rubygems'
require 'fog'
connection = Fog::Storage.new(
    :provider              => 'AWS',
    :aws_secret_access_key => "#{YOUR_SECRECT}",
    :aws_access_key_id     => "#{YOUR_ACCESS_KEY}"
)

directory = connection.directories.get("#{YOUR_BACKUP_DIRECTORY}")

# upload the file
file = directory.files.create(
    :key    => '#{REMOTE_FILE_NAME}',
    :body   => File.open("#{LOCAL_BACKUP_FILE_PATH}"),
    :public => true
)

The command that I use to make a pgbackup on my local machine is

system "PGPASSWORD=#{YOUR_DB_PASSWORD} pg_dump -Fc --no-acl --no-owner -h localhost -U #{YOUR_DB_USER_NAME} #{YOUR_DB_DATABSE_NAME} > #{LOCAL_BACKUP_FILE_PATH}"

I have put a rake task that automates all these steps.

After thing your might try is use worker(DelayedJob). I guess you can configure your workers to run every 24 hours. I think workers don't have the restriction of 30 seconds limit. But I am not sure about the memory usage.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文