迭代 SimpleDB 中的所有项目
假设我有一个包含大约 300 万个项目的 AWS SimpleDB 域,每个项目都有一个属性“foo”,其值为某个任意整数(当然,它实际上作为字符串存储在 SimpleDB 中,但让我们忽略到 和 的转换)从现在开始)。我想每 60 秒增加每个项目的 foo 值,直到达到最大值(每个项目的最大值不同,项目的最大值存储为项目中的另一个属性值),然后将 foo 重置为零:读取、递增、评估、存储。
考虑到大量的项目,以及 60 秒的硬性时间限制,这种方法在 SimpleDB 中是否可行?有人有办法让这项工作发挥作用吗?
Let's say I have a AWS SimpleDB domain with around 3 million items, each item has an attribute of "foo" with a value of some arbitrary integer (which is of course actually stored in SimpleDB as a string, but let's ignore the conversion to and from for now). I would like to increment the foo value for each item every 60 seconds, until it reaches a maximum value (max value is not the same for each item, item's max is stored as another attribute-value in item), then reset foo to zero: read, increment, evaluate, store.
Given the large number of items, and the hard 60 second time limit, is this approach feasible in SimpleDB? Anyone have an approach to make this work?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
你可以做到,但这是不可行的。对于单个域,您每秒只能获得 100-300 个 PUT。您每秒可以读取多达 1000 个项目,因此写入将成为瓶颈。
保守一点,假设每个域每秒 100 次存储操作。您需要 500 个域才能打开足够的吞吐量来存储每分钟全部 300 万个数据。默认情况下您只能获得 100 个,因此您必须要求更多。
而且它会很贵。具有少量属性的写入成本约为每百万美元 3 美元,读取成本约为每百万美元 1.30 美元。大约是 13 美元/分钟。
我真正能建议的唯一一件事就是是否有一种方法可以将 300 万个项目组合成更少数量的项目。如果有办法将 50 个“项目”放入每个真实项目中,您可以使用 10 个域来实现,价格约为 15.50 美元/小时。但我仍然不认为这是可行的,因为您可以以 6.80 美元/小时的价格获得由 10 个超大型高 CPU EC2 服务器实例组成的集群。
You can do it, but it is not feasible. You can only get between 100-300 PUTs per second for a single domain. You can read upwards of 1000 items per second so writes will be the bottleneck.
To be on the conservative side lets say 100 store operations per second, per domain. You'd need 500 domains to open up enough throughput to store all 3 million each minute. You only get 100 by default, so you'd have to ask for more.
Also it would be expensive. Writes with a small number of attributes are about $3 per million and reads are about $1.30 per million. That's about $13 / minute.
The only thing I can really suggest would be if there was a way to combine the 3 million items into a smaller number of items. If there were a way to put 50 "items" into each real item, you could do it with 10 domains at about $15.50 / hour. But I still wouldn't call that feasible, since you can get a cluster of 10 Extra Large High-CPU EC2 server instances for $6.80 / hour.
为什么不在读取时从受信任的时钟生成值?我将编写一些名称:
获取您建议存储在属性中的值,
因此,在任何时候,您都可以通过(current_time - touch_time) % (max_age * 60)
假设 max_age 变化相对较少,并且每个人都信任 touch_time 和 current_time 在一分钟之内,这就是什么NTP 用于。
Why not generate the value at read time from a trusted clock? I'm going to make up some names:
So at any time, you can get the value you were proposing to store in an attribute by
(current_time - touch_time) % (max_age * 60)
Assuming max_age changes relatively infrequently, and everyone trusts touch_time and current_time to within a minute, and that's what NTP is for.