Amazon S3 是否可以存储来自数百万个端点的传感器数据流?
我正在寻找可靠(且快速)存储来自数百万个端点(这里比较乐观)的少量传感器数据的选项。 我所说的规模是 1M 个端点,每个端点每分钟发送 100 个字节。在此之后不久,该数据需要可供分析。 此外,这些数据将保存几年,总存储量可能会超过 100TB。
S3 是这个问题的解决方案吗?或者我最好托管自己的 NoSQL 集群(例如 Cassandra/MongoDB 等)?
如果我没有指定任何信息,请告诉我。
I am looking for options for reliable (and speedy) storage for small amounts of sensor data that would be coming in from (getting optimistic here) millions of endpoints.
The scale I'm talking is 1M endpoints, each sending 100 bytes every minute. This data needs to be available for analysis shortly after this.
Additionally, this data will be kept for a few years and may exceed 100TB of total storage.
Is S3 the solution to this, or would I be better off hosting my own NoSQL cluster like Cassandra/MongoDB etc?
Please let me know if I have not specified any information.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
是的,你可以。但是,S3 中没有查询机制,也没有任何在一个请求中读取多个对象的方法。您也没有任何机制在写入数据之前检查数据。
这可能是一个更好的主意:
它会将数据接收与任何数据加载/存储阶段分离。
请注意,许多亚马逊服务均按请求收费。对于 SQS,费用为 0.01 美元/10000 个请求。如果您想让 100 万客户每分钟写一条消息,仅请求费用每月就将超过 40,000 美元。考虑到阅读消息时会加倍。
对于 S3,POST(客户端写入)为 $0.01/1000,GET(读取)为 $0.01/10000。对于 100 万客户,仅每个请求的费用就很容易达到每月 500,000 美元。
最终,对于 100 万个客户端,您可能仅仅由于经济因素就需要运行自己的接收端点。
Yes, you could. But, there are no query mechanisms nor any method of reading multiple objects in one request in S3. You would also not have any mechanism to inspect the data before it's written.
This might be a better idea:
It would de-couple receipt of the data, with any data load/storage phase.
Note that many Amazon services have a per-request charge. For SQS it's $0.01/10000 requests. If you want to have 1 million clients write one message each minute request charges alone would be over $40,000 a month. Doubling when taking reading the messages into account.
For S3, it's $0.01/1000 for POSTs (client writes), and $0.01/10000 GETs (reads). For 1 million clients your per-request charges alone could easily reach $500,000 per month.
Ultimately, at 1 million clients, you likely need to run your own receiving endpoints simply due to economic factors.