Azure 表存储 - 计算或保留总计
我正在考虑使用表存储来存储一些事务数据,但是,我需要支持一些非常高级的报告,基本上是每天/每月的总计。
我有几个选项:
使用分区/行键结构并动态执行求和
例如 20101101_ITEMID_XXXXXXXX(x = guid 或时间,以使其唯一) 然后我将使用行键的一部分 (ITEMID_201011) 查询一个月的数据,并查询类型中“Cost”属性的总计。如何管理 1000 条记录的查询限制? (即如果当天的交易超过 1000 笔,总计将很难)
使用另一条记录来存储当天的总计,并在添加新记录时更新此记录
例如行键“20101101_ITEMID_TOTAL” 然后查询天总数、月总数或年总数。
最好的方法是什么?使用表存储来满足此类需求是否有“最佳实践”?
I'm looking into using Table Storage for storing some transactional data, however, I need to support some very high level reporting over it, basically totals per day / month.
Couple of options I have though of:
Use a partition / row key structure and dynamically perform sum
e.g. 20101101_ITEMID_XXXXXXXX (x = guid or time, to make unique)
then I would query for a months data using a portion of the row key (ITEMID_201011), and to a total on the "Cost" property in the type.How would the query limit of 1000 records be managed by this though? (i.e. if there are more than 1000 transactions for the day, totaling would be hard)
Use another record to store the total for the day, and update this as new records are added
e.g. row key "20101101_ITEMID_TOTAL"
then query off this for the days totals, or months, or years totals.
What is the best way to do this? Is there a 'best practice' for this type of requirement using table storage?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我不确定什么是最佳实践,但我可以评论说,我们与 AzureWatch 有类似的情况,并且绝对是使用表中预先汇总的值。
主要是出于性能原因——即使您通过单个分区键和行键中的范围进行查询,表存储也不是即时的。下载记录所需的时间相当长,并且根据记录的不同,CPU 可能会上升,因为它需要将数据反序列化为对象。如果由于 1000 条记录的限制,您需要多次访问表存储,那么您也会支付更多费用。
需要考虑的其他一些想法:
您的汇总总数会发生变化吗?如果不是,这是预聚合的又一推动力。
您是否需要在原始数据消失后保留聚合值,或者是否需要清除原始数据?如果是,那么这是预聚合的另一个推动力
I'm not sure what is the best practice but I can comment that we have a similar situation with AzureWatch and are definitely using pre-aggregated values in tables.
Mostly for performance reasons -- table storage is not instantaneous even if you query by single partition-key and a range in row-key. The time it takes to download the records is somewhat significant and depending on the records might spike the CPU up, because it needs to de-serialize the data into objects. If you get to travel to the table storage multiple times because of the 1000 record limit, you'll be paying more as well.
Some other thoughts to consider:
Will your aggregated totals ever change? If no, the this is another nudge toward pre-aggregation
Will you need to keep aggregated values after raw data is gone or will you ever need to purge raw data? If yes, then it is another nudge toward pre-aggregation