使用 Azure 表创建增量报告
我需要在表存储中创建增量报告。我需要能够从多个不同的辅助角色实例(每个不同的角色有多个实例)更新相同的记录。
我的报告主要包含在解析最初存储的原始数据后需要增加的值。
我找到的乐观解决方案是使用重试机制:尝试更新记录。如果您收到 412 结果代码(您没有最新的 ETAG 值),请重试。您拥有的用户越多,需要同时更新的数据越多(正是我的情况),该解决方案的效率就会降低,成本也会增加。
我想到的另一种解决方案是只有一个工作者角色的一个实例可以更新任何给定的记录。这是非常有问题的,因为这意味着我将在设计中在我的架构中创建瓶颈,这与我希望通过 Azure 达到的规模相反。
如果这里有人对此类用例有一些最佳实践,我很想听听。
I need to create incremental reports in the table storage. I need to be able to update the same records from several different worker role instances (different roles with several instances each).
My reports consist mainly of values that I need to increment after I parse the raw data I initially stored.
The optimistic solution I found is to use a retry mechanism: Try to update the record. If you get a 412 result code (you don't have the latest ETAG value), retry. This solution becomes less efficient and more costly the more users you have and the more data you need to update simultaneously (my case exactly).
Another solution that comes to mind is to have only one instance of one worker role that can possibly update any given record. This is very problematic because this means that I will by-design create bottlenecks in my architecture, which is the opposite of the scale I want to reach with Azure.
If anyone here has some best practices in mind for such a use case, I would love to hear it.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
大多数云存储(表存储就是其中之一)不提供对单个实体/blob/任何内容的可扩展写入。此限制没有快速修复方法,因为此限制来自于首先为创建云存储而进行的核心权衡。
基本上,一个存储单元(实体/blob/任何东西)大约每 20 毫秒更新一次,仅此而已。无论是否有专职工作人员,这方面都不会发生任何改变。
相反,您需要从不同的角度来解决您的任务。对于计数器,最常用的方法是使用分片计数器(链接适用于GAE,但您可以在 Azure 上实现等效行为)。
另外,还有另一种方法可以减轻异步架构的痛苦 CQRS ,其中您对更新施加的性能限制实体的延迟显着降低。
Most cloud storages (Table Storage is one of those) do not offer scalable writes on a single entity/blob/whatever. There is no quick-fix for this limitation, as this limitation comes from the core tradeoff that have being made to create cloud storage in the first place.
Basically, a storage unit (entity/blob/whatever) can be updated about once every 20ms, and that's about it. Having a dedicated worker or not will not change anything to this aspect.
Instead, you need to address your task from from a different angle. For counters, the most usual approach is the use of sharded counters (link is for GAE, but you can implement an equivalent behavior on Azure).
Also, another way to ease the pain to go for an asynchronous architecture ala CQRS where the performance constraints you put on the update latency of entities is significantly relaxed.
我认为该方法需要重新架构。为了确保可扩展性并限制争用数量,您希望通过提供唯一的 Table/PartitionKey/RowKey 来确保每次写入都可以乐观地工作。
如果您需要将报告的这些值合并在一起,请拥有一个单独的进程/工作线程出于报告目的对记录进行后汇总/合并。您可以使用队列或计时机制来开始聚合/合并
I believe the approach needs re-architecture. In order to ensure scalability and limit amount of contention, you want to make sure that every write can work optimistically by providing unique Table/PartitionKey/RowKey
If you need those values for reports to be merged together, have a separate process/worker that will post-aggregated/merge the records for reporting purposes. You can use a queue or a timing mechanism to start aggregation/merging