如何为已经具有唯一属性的表选择 Azure Table ParitionKey 和 RowKey
我的实体是一个键值对。 90% 的时间我会根据键检索实体,但 10% 的时间我还会进行反向查找,即按值搜索并获取键。
键和值都保证是唯一的,因此它们的组合也保证是唯一的。
使用Key作为PartitionKey、Value作为RowKey是否正确?
我相信这也将确保我的数据在服务器之间完美负载平衡,因为 ParitionKey 是唯一的。
上述决定是否存在问题? 在任何情况下使用硬编码分区键是否实用?即所有行都有相同的分区键?并保持 RowKey 唯一?
My entity is a key value pair. 90% of the time i'll be retrieving the entity based on key but 10% of the time I'll also do a reverse lookup i.e. I'll search by value and get the key.
The key and value both are guaranteed to be unique and hence their combination is also guaranteed to be unique.
Is it correct to use Key as PartitionKey and Value as RowKey?
I believe this will also ensure that my data is perfectly load balanced between servers since ParitionKey is unique.
Are there any problems in the above decision?
Under any circumstance is it practical to have a hard coded partition key? I.e all rows have same partition key? and keeping the RowKey unique?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
它是否可行,是的,但根据数据的大小,我不太确定这是一个好主意。当您查询分区键时,表格存储可以直接转到确切的分区并检索您的所有记录。如果单独查询 Rowkey,表存储必须检查该行是否存在于表的每个分区中。因此,如果您有 1000 个键值对,则按键搜索将读取单个分区/行。如果您仅通过您的值进行搜索,它将读取所有 1000 个分区!
我面临类似的问题,我用两种方法解决了它:
有2个不同的表,一个以partitionKey作为你的键,另一个以你的值作为partitionKey。存储很便宜,因此复制数据应该不会花费太多。
(我最后做了什么)如果您根据唯一键有效地返回单个实体,只需将它们粘贴在 blob 中(如第 1 点所示进行分区和旋转),因为您不需要遍历表,所以不这样做。
Is it doable, yes, but depending on the size of your data, I'm not so sure it's a good idea. When you query on partition key, Table Store can go directly to the exact partition and retrieve all your records. If you query on Rowkey alone, Table store has to check if the row exists in every partition of the table. so if you have 1000 key value pairs, searching by your key will read a single partition/row. If your search via your value alone, it will read all 1000 partitions!
I face a similar problem, I solved it 2 ways:
Have 2 different tables, one with partitionKey as your-key, the other with your-value as partitionKey. Storage is cheap, so duplicating data shouldn't cost much.
(What I finally did) If you're effectively returning single entites based on a unique key, just stick them in blobs(partitioned and pivoted as in point 1), because you don't need to traverse a table, so don't.