AWS GLUE还是FARGATE?将数据从S3移动到RD

发布于 2025-02-09 23:57:30 字数 536 浏览 2 评论 0原文

我试图决定要使用哪个AWS GLUE或FARGATE,想知道任何建议 /以前的经验。

因此,我们的用例是按时间表(可能每小时)将数据从S3移至RD。文件的数量约为每小时200-40万,每个文件的数量将很小(可能为5-20kb),总计为8GB。但是,在爆发案件中,我们可能会在一个小时内发出多达100万条消息,从而导致20GB。 S3中的文件将是JSON格式,我们想进行一些转换并写入RDS。 S3中将有多个存储桶,每个都会写入不同的RDS表。我们有一些现有的Java库用于消息转换和SQL语句生成,如果可能的话,我们希望重复使用。

因此,目前我正在考虑2条路径:GLUE或FARGATE。

使用胶水,我们可以使用Scala脚本并依靠Java库进行转换。这样,我们可以利用(1)胶合触发器安排工作,(2)SPARK用于分布式过程,(3)粘合工作书签以自动检测需要处理哪些数据。但是,不利的一面是可能要贵一些。

借助Fargate,它肯定仍然可行,但是我们需要自己完成以上所有3个,但是灵活性和便宜。

总体而言,如果有人有类似的经验,那么在这两个选择方面的开发和维护工作之间,你们都在想什么。

I'm trying to decide which to use, AWS Glue or Fargate, would like to know any advice / previous experience.

So our use case is to move data from S3 to RDS with some schedule (likely every hour). The number of files would be around 200-400k hourly, and each would be small (likely 5-20KB each), which is up to 8GB total. In burst case though, we could have up to 1 million messages in an hour, which leads to 20GB. The file in S3 would be Json format, and We would like to do some transformation and batch write to RDS. There will be multiple buckets in S3 and each will write to a different RDS table. We have some existing java library for the message transformation and sql statement generation, which we would like to reuse if possible.

So currently I'm considering 2 paths: Glue or Fargate.

With Glue, we can use Scala script and depend on our java library for transformation. This way, we could leverage (1) the Glue trigger to schedule the job, (2) Spark for distributed process and (3) Glue job bookmark to auto detect what data needs to be processed. The downside though, is it might be a bit more expensive.

With Fargate, it's definitely still doable, but we will need to do all the above 3 ourselves, but there's more flexibility and less expensive.

Overall, if anyone have similar experience, what do you all think about between these 2 choices regarding development and maintenance efforts needed.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

情话已封尘 2025-02-16 23:57:30

您是否进行了深入的成本分析?胶水可能会变得昂贵,而Fargate的成本相当低。这里的最高成本可能是listBucket请求,胶水和Fargate可能会使用的(除非名称是可以预测的?)

S3文件的来源是什么?可以使用Firehose将它们送往RD?如果愿意,将备份分配给S3,以保持S3中的沉积。总有几种方法可以做这类工作。另一种方法是从S3中的文件降落中触发Java Lambda,因此它只是连续复制。

Have you done a deep cost analysis? Glue can get a bit pricey and Fargate is fairly low cost. The highest cost here could be the listBucket request which both Glue and Fargate would likely use though ( unless the names are predictable? )

What is the source of the S3 files? Can they be piped to RDS using Firehose? Split out a backup to S3 to also maintain depositing in S3 if you like. There's always a few ways to do this type of work. Another way would be triggering a java lambda from the files landing in S3 so it just continuously copies over.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文