在AWS上同步,调度和执行node.js脚本
通常,我们将代码存储在github上,然后将其部署在AWS lambda上。
现在,我们受到特定node.js脚本的挑战。
- 大约需要一个小时的时间,因此我们不能在lambda上部署它。
- 它只需要每月运行一次。
- 有时,我们将在GitHub存储库中更新脚本,我们希望AWS中的脚本保持同步(例如,例如使用管道使用管道),
- 此脚本将从S3复制文件并在本地处理它们。它对数据进行了一些繁重的举重。
将其设置为AWS的推荐方法是什么?
Usually, we store our code on github, and then deploy it on AWS lambda.
We are now challenged with a specific Node.js script.
- it takes roughly an hour to run, we can't deploy it on a lambda because of that.
- it needs to run just once a month.
- once in a while we'll update the script in our github repository, and we want the script in AWS to stay in sync if we make changes (e.g. using a pipeline)
- this scripts copies files from S3 and processes them locally. It does some heavy lifting with data.
What would be the recommended way to set this up on AWS ?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
无服务器方法非常适合,因为您每月只能运行一次工作。 Lambda和S3之间的数据传输是免费的。如果Lambda适合您的用例舒适,除了执行时间限制外,您可以“跟踪处理的进度”,您可以创建一个状态计算机,该状态计算机将您的lambda作为循环中的步骤函数,而您将无法处理所有S3数据块。每个Lambda执行最多可能需要15分钟,并且状态机执行时间超过1小时。关于OPS,您可以在GitHub上有一个触发器,该触发器将发布新版本的Lambda。您可以使用AWS CloudFormation,CDK或任何其他合适的工具。
The serverless approach fits nicely since you will run the work only once per month. Data transfer between Lambda and S3 (in the same region) is free. If Lambda is comfortable for your use case except for execution time constraints and you can "track the progress" of the processing, you can create a state machine that will invoke your lambda as a step function in the loop while you will not process all S3 data chunks. Each lambda execution can take up to 15 minutes and state machine execution time is way beyond 1 hour. Regarding ops, you can have a trigger on your GitHub that will publish a new version of the lambda. You can use AWS CloudFormation, CDK or any other suitable tool for that.