当前位置：文江博客话题详情

使用部分 RowKey 时是否会对 Azure 表存储的查询建立索引？

发布于 2024-10-16 14:52:18 字数 474 浏览 12 评论 0原文

我从 MS PDC 演示中了解到，PartitionKey 用于跨多个服务器对表进行负载平衡，但似乎没有人就 PartitionKey 是否用作单个服务器内的索引提供任何建议。

同样，每个人都会告诉您指定 PartitionKey 和 RowKey 可以获得出色的性能，但似乎没有人告诉您 RowKey 是否用于提高 PartitionKey 内的性能。

以下是一些示例查询，可帮助我提出问题。假设整个表包含 100,000,000 行。

PartionKey="123" 且 OtherField="def"
PartitionKey="123" 且 RowKey >= "aaa" 且 RowKey < “aac”

这是我的问题：

如果每个分区中只有 10 行，查询 1 会很快吗？
如果每个分区中有 1,000,000 行，查询 2 会很快吗？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

无尽的现实 2024-10-23 14:52:19

两者应该都比较快。

查询 1 必须在单个分区内进行全面扫描（ATS 行话中的范围扫描），但这意味着迭代 10 个实体。

查询 2 也将导致范围扫描，但使用 RowKey 作为分区内的索引，因此它应该仍然很快。

您可以获得一篇非常详细的博客文章，其中包含每个查询的所有性能影响，以及如何定义最佳键：http://blogs.msdn.com/b/windowsazurestorage/archive/2010/11/ 06/how-to-get-most-out-of-windows-azure-tables.aspx

回复收藏 0 原文

Oo萌小芽oO 2024-10-23 14:52:19

除了泰勒的回答之外，类似的陈述也适用于范围查询，如所讨论的此处。

换句话说，Azure 表存储确实可以被认为只有一个索引，该索引由两部分组成，即分区键和范围键（按顺序）。

回复收藏 0 原文

雨轻弹 2024-10-23 14:52:19

我认为自从 WAS 论文< /a> 是这样写的，但如果你读过它，你可以得出一些结论。

例如，分区可以在节点/物理服务器之间移动。如果您有多个分区，其扩展性比单个分区更好。如果您有 1 个巨大分区，您将受到单个分区吞吐量的限制。

请注意，许多小分区（范围内连续）可以移动到单个节点/物理服务器。如果分区在逻辑上紧密分组（即排序），则跨分区扫描的速度不一定会更慢。

如果您需要分区键来处理超过提供的 2000 个请求/秒，您必须找到一种方法将分区键拆分为多个分区，否则，这并不重要。

哦，您只能在单个分区键内执行实体组事务，这可能会影响您的设计。

回顾一下：

您是否需要超过 2000 个请求/秒？
您需要实体集团交易吗？

这是你需要问自己的两个问题。

回复收藏 0 原文

月下伊人醉 2024-10-23 14:52:18

在 ATS 中，PartitionKey 用作分布查找，而不是索引。从使用ATS的层面来看，只需考虑PartitionKey和“服务器”/节点共享1:1的关系。（在幕后这不是真的，但是优化驻留在同一物理/虚拟节点上的 PartitionKey 等概念是从 Azure 消费者必须处理的内容中抽象出几个级别的。这些细节纯粹是内部的对于整个 Azure 基础设施而言，对于 ATS 来说，最好假设这是一个最佳方案……又名“不用担心它”）

在 DBMS 与 ATS 的背景下，RowKey 是最接近“索引”的东西，因为它有助于跨相似节点查找数据。要直接回答您的问题之一，RowKey 是 PartitionKey 中的索引。

然而，稍微跳出框框，PartitionKey 可以让您获得更接近传统索引的性能收益，但这只是因为数据在 ATS 节点上分布的分布式性质。您应该首先优化 PartitionKey 的布局，然后优化 RowKey。（也就是说，如果您只有一个可设置键的值，请将其设为 PartKey）

一般来说，查询将按照从最高效到最低效的顺序执行

1. PartitionKey=x 和 RowKey=y（以及 OtherProp = z）

因为查找到达正确的节点，然后到达分区

2 上的索引 prop。PartitionKey=x（且 OtherProp =z），

因为您到达正确的节点，然后到达 ATS equvi。全表扫描

3. OtherProp = z

因为你必须先进行分区扫描，然后再进行表扫描

有了这个，对于你的直接问题

我认为这无法回答。它是主观的（即“什么是快？”）。它总是比 Query2 慢，但是对于 10 行，“慢”可能是毫秒，即使
（类似主题）它也会比查询 1 更快。任何时候你可以执行 Query2，你应该

因此通过解释和您的问题，真正的答案取决于您的架构师如何使用 ATS。

根据您的数据集（当前和预期的增长），您需要确定一个适当的方案，以便您可以以最快的方式到达您的分区和行。了解查找是如何发生的，您可以做出逻辑决策，决定什么路径可以足够快地到达那里，更多的部分，更少的行 - 与更少的部分，更多的行等

In ATS, PartitionKey is used as a distribution lookup, not an index. From the level of working with ATS, just consider PartitionKey and "server"/node to share a 1:1 relationship. (Behind the scenes this isn't true, but concepts such as optimizing PartitionKeys that happen to reside on the same physical/virtual node are abstracted several levels from what a consumer of Azure has to deal with. Those details are purely internal to the overall Azure infrastructure and in the case of ATS, its best to assume that is an optimal as it can be ... aka "dont worry about it")

In the context of a DBMS vs ATS, RowKey is the closest thing to an "index" in that it assists in finding data across a similar node. To directly answer one of your question, RowKey is the index within the PartitionKey.

Stepping outside the box a bit, however, PartitionKey can give you perf gains closer to how you think of a traditional index, but only because of the distributed nature of how your data is spread across ATS nodes. You should optimize layout 1st to the PartitionKey, then to the RowKey. (aka, if you only have one keyable value, make it the PartKey)

In general rule, queries are going to perform in this order, from most efficient to least efficient

1. PartitionKey=x and RowKey=y (and OtherProp = z)

because the lookup gets to the right node and then to an indexed prop on the partition

2. PartitionKey=x (and OtherProp =z)

because you get to the proper node, but then to the ATS equvi. of a full table scan

3. OtherProp = z

because you have to a partition scan, then a table scan

With that, to your direct questions

I don't feel this can be answered. Its subjective (ie "what is fast?"). It will always be slower than Query2, but with 10 rows that "slowness" is likely milliseconds if even
(similar theme) It will be faster than Query 1. Anytime you can do Query2, you should

So with that explaination and your questions, the real answer comes down to how your architect your usage of ATS.

Based on your data set (both current and expected growth) you need to determine a proper scheme so that you can get to your Partition AND to your Row is the fastest way possible. Knowing how the lookup occurs, you can make logical decisions as to what path is going to get you there fast enough, more parts, less rows -vs- less parts, more rows, etc

回复收藏 0 原文