我一直在阅读有关 Kinesis 与 SQS 的差异以及何时使用它们的信息,但我很难知道哪种解决方案适合这个特定问题:
- 类似 Strava 的应用程序,用户记录其运行
- 每秒 50 次传入运行
- 每次运行的处理正好需要 1 分钟
- 我希望用户在 5 分钟内得到结果
- 一次运行只是一个 guid,处理它的作业将从 S3 获取所有信息
如果我在 kinesis 中理解正确,每个分片可以有 1 个工作人员, 正确的?这意味着每分钟运行 1 次。由于我每分钟有 3000 个传入运行,要满足 5 分钟的截止日期意味着我需要有 600 个分片,每个分片有 1 个工作人员。
这个假设正确吗?
使用 SQS,我可以只拥有 1 个队列和任意数量的工作人员,最多可达 SQS 120,000 条传输消息的限制。
I have been reading about Kinesis vs SQS differences and when to use each but I'm struggling to know which is the appropriate solution for this particular problem:
- Strava-like app where users record their runs
- 50 incoming runs per second
- The processing of each run takes exactly 1 minute
- I want the user to have their results in less than 5 minutes
- A run is just a guid, the job that processes it will get al the info from S3
If i understand correctly in kinesis you can have 1 worker per shard, correct? That would mean 1 runs per minute. Since i have 3000 incoming runs per minute, to meet the 5 minute deadline would mean i would need to have 600 shards with 1 worker each.
Is this assumption correct?
With SQS I can just have 1 queue and as many workers as I like, up to SQS's limit of 120,000 inflight messages.
-
If 1 run errors during processing I want to reprocess it a few more times and then store it for further inspection.
-
I don't need to process messages in order, and duplicates are totally fine.
发布评论
评论(1)
,在这种情况下,应使用 sqs之类的排队服务。 Kinesis是一项流媒体服务,它坚持数据。这意味着,只要有效,就可以从流中读取消息。您的工人中的非工人将能够从流中删除消息。
另外,使用SQS,您可以设置这将使您在预定义的试验次数后捕获未能处理的消息。
In that case, a queuing services such as SQS should be used. Kinesis is a streaming service, which persist a data. This means that multiple works can read messages from a stream for as long as they are valid. Non of your workers would be able to remove the message from the stream.
Also with SQS you can setup dead-letter queues which would allow you capture messages with fail to process after a pre-defined number of trials.