Azure 表存储 - 选择在查询之间使用的 PartitionKey 和 RowKey

发布于 2024-11-05 04:24:45 字数 1180 浏览 3 评论 0原文

我是 Azure 的新手!目的是根据 RowKey 中存储的时间戳返回行。由于每个查询都会产生事务成本,因此我希望在保持性能的同时最大限度地减少事务/查询的数量。

这些是建议的分区键和行键:

  • 分区键: TextCache_(AccountID)_(ParentMessageId)
  • 行键: (DateOfMessage)_(MessageId)

图例

  • AccountId - 是一个整数
  • ParentMessageId - 父级消息Id(如果有),如果是父级则为空
  • DateOfMessage - 创建消息的日期 - 格式将为 DateTime.Ticks.ToString("d19")
  • 消息的唯一 ID

MessageId -我想从单个查询返回的行和任何子行的 > >或< DateOfMessage_MessageId

这可以通过我建议的 PartitionKeys 和 RowKeys 来完成吗?

即..(在伪代码中)

var results = ctx.PartitionKey.StartsWith(TextCache_AccountId) 
   && ctx.RowKey > (TimeStamp)_MessageId

其次,如果我有多个帐户,并且只想返回前10个帐户,是否可以通过单个查询来完成

,即..(在伪代码中)

var results = ( 
      ( 
        ctx.PartitionKey.StartsWith(TextCache_(AccountId1)) && 
            && ctx.RowKey > (TimeStamp1)_MessageId1 )
      )
      ||
      ( 
        ctx.PartitionKey.StartsWith(TextCache_(AccountId2)) && 
            && ctx.RowKey > (TimeStamp2)_MessageId2 )
      ) ... 
          )
         .Take(10)

I am a total newbie with Azure! The purpose is to return the rows based on the timestamp stored in the RowKey. As there is a transaction cost with each query, I want to minimize the number of transactions/queries whilst maintain performance

These are the proposed Partition and Row Keys:

  • Partition Key: TextCache_(AccountID)_(ParentMessageId)
  • Row Key: (DateOfMessage)_(MessageId)

Legend:

  • AccountId - is an integer
  • ParentMessageId - The parent messageId if there is one, blank if it is the parent
  • DateOfMessage - Date the message was created - format will be DateTime.Ticks.ToString("d19")
  • MessageId - the unique Id of the message

I would like to get back from a single query the rows and any childrows that is > or < DateOfMessage_MessageId

Can this be done via my proposed PartitionKeys and RowKeys?

ie.. (in psuedo code)

var results = ctx.PartitionKey.StartsWith(TextCache_AccountId) 
   && ctx.RowKey > (TimeStamp)_MessageId

Secondly, if there I have a number of accounts, and only want to return back the first 10, could it be done via a single query

ie.. (in psuedo code)

var results = ( 
      ( 
        ctx.PartitionKey.StartsWith(TextCache_(AccountId1)) && 
            && ctx.RowKey > (TimeStamp1)_MessageId1 )
      )
      ||
      ( 
        ctx.PartitionKey.StartsWith(TextCache_(AccountId2)) && 
            && ctx.RowKey > (TimeStamp2)_MessageId2 )
      ) ... 
          )
         .Take(10)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

甲如呢乙后呢 2024-11-12 04:24:45

对您的问题的简短回答是肯定的,但您需要注意一些事情。

Azure 表存储没有直接等效的 .StartsWith()。如果您将存储库与 LINQ 结合使用,则可以使用 .CompareTo() (> 和 < 不能正确翻译),这意味着如果您运行帐户 1 的搜索并且您要求查询返回 1000 个结果,但帐户 1 只有 600 个结果,最后 400 个结果将是帐户 10(词汇上的下一个帐号)。因此,您需要明智地对待如何处理结果。

如果您用前导 0 填充帐户 ID,则可以执行类似的操作(此处也是伪代码)。

ctx.PartionKey > "TextCache_0000000001"
&& ctx.PartitionKey < "TextCache_0000000002"
&& ctx.RowKey > "123465798"

另外要记住的是,对 Azure 表的查询会在 PartitionKey 中返回其结果,然后在 RowKey 顺序。因此,在您的情况下,没有 ParentMessageId 的消息将在具有 ParentMessageId 的消息之前返回。如果您永远不会通过 ParentMessageId 查询此表,我会将其移至属性。

如果 TextCache_ 只是一个字符串常量,则它不会通过包含在 PartitionKey 中来添加任何内容,除非这在返回时对您的代码确实有意义。

虽然您的第二个查询将运行,但我认为它不会产生您想要的结果。如果您想要按 DateOfMessage 顺序排列前十行,那么它将不起作用(请参阅我上面关于排序顺序的观点)。如果您按原样运行此查询,并且帐户 1 有 11 条消息,它将仅返回与帐户 1 相关的前 10 条消息,无论帐户 2 是否有较早的消息。

虽然尽量减少使用的事务数量是一种很好的做法,但不必太担心。运行工作人员/网络角色的成本将使您的交易成本相形见绌。 1,000,000 笔交易将花费您 1 美元,这低于运行一个小型实例 9 小时的成本。

The short answer to your questions is yes, but there are some things you need to watch for.

Azure table storage doesn't have a direct equivalent of .StartsWith(). If you're using the storage library in combination with LINQ you can use .CompareTo() (> and < don't translate properly) which will mean that if you run a search for account 1 and you ask the query to return 1000 results, but there are only 600 results for account 1, the last 400 results will be for account 10 (the next account number lexically). So you'll need to be a bit smart about how you deal with your results.

If you padded out the account id with leading 0s you could do something like this (pseudo code here as well)

ctx.PartionKey > "TextCache_0000000001"
&& ctx.PartitionKey < "TextCache_0000000002"
&& ctx.RowKey > "123465798"

Something else to bear in mind is that queries to Azure Tables return their results in PartitionKey then RowKey order. So in your case messages without a ParentMessageId will be returned before messages with a ParentMessageId. If you're never going to query this table by ParentMessageId I'd move this to a property.

If TextCache_ is just a string constant, it's not adding anything by being included in the PartitionKey unless this will actually mean something to your code when it's returned.

While you're second query will run, I don't think it will produce what you're after. If you want the first ten rows in DateOfMessage order, then it won't work (see my point above about sort orders). If you ran this query as it is and account 1 had 11 messages it will return only the first 10 messages related to account 1 regardless if whether account 2 had an earlier message.

While trying to minimise the number of transactions you use is good practice, don't be too concerned about it. The cost of running your worker/web roles will dwarf your transaction costs. 1,000,000 transactions will cost you $1 which is less than the cost of running one small instance for 9 hours.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文