解析和进程1M+行
我们正在编写一项可以解析包含1M+行的CSV文件的服务。每行都需要数据库记录的插入(或更新)数据。我们目前正在使用DynamoDB。
哪些AWS服务最适合此?我们正在考虑Lambda和队列系统。
我们正在寻找的解决方案。
- API :API将接受CSV文件上传。 API将通过文件并开始块读取它。每个块将被推到排队以进行处理。
- 队列:队列将原始文件的块作为要处理的消息。
- 处理器:块处理器通过队列并将记录插入发电机。
挑战:
- 您将如何触发处理通知的结束(指示该请求的最后一部分已完成)。
- 如果使用这种方法,您将如何处理部分故障和回滚?
We are writing a service that can parse a CSV file containing over 1M+ rows. Each row needs insertion (or update) data of a DB record. We are currently using DynamoDB.
What AWS services are best suited for this? We are considering lambda and queue system.
Solution we are looking at.
- API: An API would accept the csv file upload. The API would go through the file and start reading it in chunks. Each chunk would be pushed to a queue for processing.
- QUEUE: The queue holds chunks of the original file as a message to be processed.
- PROCESSOR: The chunk processor goes through the queue and inserts records to dynamo.
Challenges:
- How would you trigger an end of processing notification (to indicate the last chunk for that request was completed).
- How would you handle partial failure and rollback if using this approach?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论