亚马逊网络服务的后台工作

发布于 2024-11-27 14:28:55 字数 266 浏览 1 评论 0原文

我是 AWS 新手,因此我需要一些有关如何正确创建后台作业的建议。我有一些数据(大约 30GB),我需要:

a)从其他服务器下载;它是一组 zip 存档,其中包含 RSS 提要中的链接

b) 解压缩到 S3

c) 处理每个文件或有时一组解压缩文件,执行数据转换,并将其存储到 SimpleDB/S3

d) 根据 RSS 更新永久重复

有人可以建议 AWS 上正确解决方案的基本架构吗?

谢谢。

丹尼斯

I am new to AWS so I needed some advice on how to correctly create background jobs. I've got some data (about 30GB) that I need to:

a) download from some other server; it is a set of zip archives with links within an RSS feed

b) decompress into S3

c) process each file or sometime group of decompressed files, perform transformations of data, and store it into SimpleDB/S3

d) repeat forever depending on RSS updates

Can someone suggest a basic architecture for proper solution on AWS?

Thanks.

Denis

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

若沐 2024-12-04 14:28:56

我认为在 Elasticbeanstalk 实例上部署您的代码将可以大规模地为您完成这项工作。因为我看到您正在此处处理大量数据,并且使用普通的 EC2 实例可能会耗尽资源(主要是内存)。此外,AWS SQS 批量处理的想法也将有助于优化流程并有效管理服务器端的超时

I think deploying your code on an Elasticbeanstalk Instance will do the job for you at scale. Because I see that you are processing a huge chunk of data here, and using a normal EC2 Instance might max out resources mostly memory. Also the AWS SQS idea of batching the processing will also work to optimize the process and effectively manage time outs on your server-side

倒带 2024-12-04 14:28:55

我认为您应该运行一个 EC2 实例来执行您需要的所有任务,并在完成后将其关闭。这样您只需为 EC2 的运行时间付费。然而,根据您的架构,您可能需要一直运行它,但是小型实例非常便宜。

I think you should run an EC2 instance to perform all the tasks you need and shut it down when done. This way you will pay only for the time EC2 runs. Depending on your architecture however you might need to run it all the times, small instances are very cheap however.

我做我的改变 2024-12-04 14:28:55

从其他服务器下载;它是一组 zip 存档,其中包含 RSS 提要中的链接

您可以使用 wget

解压到S3

尝试使用s3-tools (github.com/timkay/aws/raw/master/aws)

处理每个文件或有时一组解压文件,执行数据转换,并将其存储到 SimpleDB/S3

编写您自己的 bash 脚本

根据 RSS 更新永远重复

又一个 bash 脚本来检查更新 + 通过 Cron 运行脚本

download from some other server; it is a set of zip archives with links within an RSS feed

You can use wget

decompress into S3

Try to use s3-tools (github.com/timkay/aws/raw/master/aws)

process each file or sometime group of decompressed files, perform transformations of data, and store it into SimpleDB/S3

Write your own bash script

repeat forever depending on RSS updates

One more bash script to check updates + run the script by Cron

抠脚大汉 2024-12-04 14:28:55

首先,编写一些执行 a) 到 c) 的代码。测试它等等。

如果您想定期运行代码,那么它是使用后台进程工作流的良好候选者。将作业添加到队列中;当它被认为完成时,将其从队列中删除。大约每隔一小时就会向队列添加一个新作业,这意味着“获取 RSS 更新并解压缩它们”。

您可以使用 AWS Simple Queue Service 或任何其他后台作业处理服务/库手动完成此操作。您可以在 EC2 或任何其他托管解决方案上设置一个工作实例,该实例将轮询队列、执行任务并再次轮询,直到永远。

使用 Amazon Simple Workflow Service 可能更容易,它似乎适合您正在尝试的用途要做(自动化工作流程)。注意:我从未真正使用过它。

First off, write some code that does a) through c). Test it, etc.

If you want to run the code periodically, it's a good candidate for using a background process workflow. Add the job to a queue; when it's deemed complete, remove it from the queue. Every hour or so add a new job to the queue meaning "go fetch the RSS updates and decompress them".

You can do it by hand using AWS Simple Queue Service or any other background job processing service / library. You'd set up a worker instance on EC2 or any other hosting solution that will poll the queue, execute the task, and poll again, forever.

It may be easier to use Amazon Simple Workflow Service, which seems to be intended for what you're trying to do (automated workflows). Note: I've never actually used it.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文