定期从 Heroku 上的文件导入数据
我需要定期将一些数据导入 Heroku 上的 Rails 应用程序。
要执行的任务分为以下部分: * 从网站下载一个大的 zip 文件(例如 ~100mb) * 解压文件(解压后的空间约为1.50GB) * 运行一个 rake 脚本来读取这些文件并使用我的活动记录模型创建或更新记录 * cleanup
我怎样才能在heroku上做到这一点?使用一些外部存储(例如S3)是否更好? 你会如何处理这样的事情?
理想情况下,这需要每晚运行。
I need to periodically import some data into my rails app on Heroku.
The task to execute is split into the following parts:
* download a big zip file (e.g. ~100mb) from a website
* unzip the file (unzipped space is ~1.50gb)
* run a rake script that reads those file and create or update records using my active record models
* cleanup
How can I do this on heroku? Is it better to use some external storage (e.g. S3).
How would you approach such a thing?
Ideally this needs to run every night.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
几天前我尝试过完全相同的事情,我得出的结论是这是无法完成的,因为 Heroku 对每个进程施加了内存限制。 (我用从互联网上读取的文件构建了一个数据结构,并尝试推送到数据库)
我使用了一个 rake 任务,该任务将拉取并解析几个大文件,然后填充数据库。
作为解决方法,我现在在本地计算机中运行此 rake 任务,并将数据库推送到 S3,并从本地计算机发出 heroku 命令来恢复 heroku 数据库实例。
您可以使用雾库推送到 S3
我用来在本地计算机上进行 pgbackup 的命令是
我放置了一个 rake 任务来自动执行所有这些步骤。
之后您可能会尝试使用worker(DelayedJob)。我想你可以将你的工作人员配置为每 24 小时运行一次。我认为工人没有30秒的限制。但我不确定内存使用情况。
I have tried exact same thing couple of days back and the conclusion that I came up with was this can't be done because of memory limit restrictions that heroku imposes on each process. (I build a data structure with the files that I read from the internet and try to push to DB)
I was using a rake task that would pull and parse couple of big file and then populate the database.
As a work around I run this rake task in my local machine now and push the database to S3 and issue a heroku command from my local machine to restore the heroku DB instance.
You could push to S3 using fog library
The command that I use to make a pgbackup on my local machine is
I have put a rake task that automates all these steps.
After thing your might try is use worker(DelayedJob). I guess you can configure your workers to run every 24 hours. I think workers don't have the restriction of 30 seconds limit. But I am not sure about the memory usage.