一个具有许多分区键的 Azure 表存储表与许多具有较少分区键的表相比如何?
我有一个 Windows Azure 应用程序,其中 TableA 的所有读取查询都是在单个分区上针对一系列行键执行的。促进此存储方案的分区键实际上是层次结构中对象的扁平化名称,因此分区键的格式类似于 {root}_{child1}_{child2}_{leaf}
。我可以理解,通过在表的命名中使用分区键的根维度将这个大 TableA 分成许多表可能会有好处(因此分区键将变为 {child1}_{child2}_ {叶}
)。
我想要做的是尽可能快速地从尽可能多的连接同时访问这些数据。如果我能弄清楚这些限制是什么或应该是什么,那就太不可思议了。
关于我提议的更改的更具体问题:
- 这是否会对可扩展性产生影响,即在不显着提高性能的情况下可以满足的并发数据访问请求的数量?是否同时服务?
- 这会对平均性能产生影响吗?潜在表现?
I have a Windows Azure application in which all read queries of TableA are executed on single partitions for a range of rowkeys. The Partition Keys that facilitate this storage scheme are actually flattened names of objects in a hierarchy, such that the Partition Key is formatted like {root}_{child1}_{child2}_{leaf}
. I can understand how it might be beneficial to divide this one big TableA into many tables by using the root dimension of the Partition Keys in the naming of the Tables (so the Partition Key would become {child1}_{child2}_{leaf}
).
What I want to do is provide as rapid access to this data as I can from as many connections at the same time as possible. It would also be incredible if I could figure out what these limits are or should be.
More specific questions about my proposed change:
- Will this make a difference in scalability, i.e. the number of simultaneous data access requests that can be served without perfecting performance dramatically? Served at the same time at all?
- Will this make a difference in average performance? Potential performance?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
如果每个查询都指定一个分区键,那么这些分区分布在多少个表上就没有什么区别。换句话说,以下情况是等效的:一个表有一千个分区与一千个表每个有一个分区。
我能想到考虑拆分为多个表的主要原因是您可以在单个操作/事务中删除整个表,而您不能在同一个表中删除一系列分区。这意味着对于日志之类的东西,您可能希望在一段时间后删除较旧的日志,通常最好针对不同的时间范围使用不同的表。
If every query specifies a partition key, it makes no difference how many tables those partitions are spread across. In other words, the following are equivalent: one table with a thousand partitions versus a thousand tables each with one partition.
The main reason I can think of to consider splitting out into multiple tables is that you can delete an entire table in a single operation/transaction, while you can't to that with a range of partitions within the same table. That means for things like logs, where you may want to delete the older ones after a while, it's often better to have different tables for different time ranges.
史蒂夫的回答+1。
需要添加的一些内容
+1 for Steve's answer.
Some things to add