使用 Azure 表可以实现这一点吗？

发布于 2025-01-04 13:23:05 字数 1087 浏览 6 评论 0原文

我在下面的 linq 查询中收到一条错误消息“不支持‘join’方法”：

tableServiceContext = new CustomTableServiceContext(storageAccount.TableEndpoint.AbsoluteUri, storageAccount.Credentials);
tableServiceContext.RetryPolicy = RetryPolicies.Retry(3, TimeSpan.FromSeconds(1));
var results = (from c in tableServiceContext.CreateQuery<ChannelEntry>("Channels").AsTableServiceQuery<ChannelEntry>()
    join v in tableServiceContext.CreateQuery<VideoEntry>("Videos").AsTableServiceQuery<VideoEntry>() on c.PartitionKey equals v.ChannelID
    join h in tableServiceContext.CreateQuery<HitEntry>("Hits").AsTableServiceQuery<HitEntry>() on v.PartitionKey equals h.VideoID
    where c.RowKey.Equals(UserID)
    group h by h.RowKey into g
    select new BiggestFan { UserID = g.Key, Hits = g.Count() }).AsTableServiceQuery().Execute().OrderByDescending(b => b.Hits).Take(1);

如果在此上下文中不支持“join”，那么执行查询的最有效方法是什么？

我的频道由视频组成，而视频又具有点击量。我正在尝试找到当前登录用户的最大粉丝（点击率最高）。

在不使用联接的情况下执行此类操作的最有效方法是什么？我是否必须先获取所有频道，然后获取视频，然后获取点击作为对表存储的 3 个单独调用，然后再进行连接？

原文

I get a error message “the method ‘join’ is not supported” in the below linq query:

tableServiceContext = new CustomTableServiceContext(storageAccount.TableEndpoint.AbsoluteUri, storageAccount.Credentials);
tableServiceContext.RetryPolicy = RetryPolicies.Retry(3, TimeSpan.FromSeconds(1));
var results = (from c in tableServiceContext.CreateQuery<ChannelEntry>("Channels").AsTableServiceQuery<ChannelEntry>()
    join v in tableServiceContext.CreateQuery<VideoEntry>("Videos").AsTableServiceQuery<VideoEntry>() on c.PartitionKey equals v.ChannelID
    join h in tableServiceContext.CreateQuery<HitEntry>("Hits").AsTableServiceQuery<HitEntry>() on v.PartitionKey equals h.VideoID
    where c.RowKey.Equals(UserID)
    group h by h.RowKey into g
    select new BiggestFan { UserID = g.Key, Hits = g.Count() }).AsTableServiceQuery().Execute().OrderByDescending(b => b.Hits).Take(1);

If “join” is not supported in this context then what would be the most efficient way to do my query ?

I have Channels which are made up of Videos which in turn have Hits. I’m trying to find the biggest fan (highest hits) of the currently logged in user.

What would be the most efficient way of doing this type of this without using joins? Would I have to grab all the Channels then Videos and then Hits as 3 separate calls to the Table Storage and then do the joins after that?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

廻憶裏菂餘溫 2025-01-11 13:23:05

是的，你不能加入。您在这里有几个选择。

1) 多次扫描 - 在加入之前执行几个 .ToArray() 语句，以便它在应用程序的内存中进行加入。这虽然性能不高，但表存储速度相当快。实际上取决于这将导致多少行。2

) 对表进行非规范化，以便您可以引用单个表中所需的所有键。这将让您在 1 个查询中获得结果，但这意味着所有插入/更新逻辑都需要更新。

回复收藏 0 原文

林空鹿饮溪 2025-01-11 13:23:05

您的查询中有 3 个内容不受 Azure 表存储（AZT，我的缩写，其他人通常不使用）查询的支持。

连接
分组
聚合函数

简而言之，如果您想在 AZT 中运行高效的查询，那么您需要仅针对一个表运行它，并针对分区键或分区键和行键进行查询。

这并不意味着您的基础数据必须存储在一张表中，您可以保留当前拥有的结构，但您可能需要构建一个基本上是索引的表，以允许您获取以下信息：你想要的。它可能具有与此类似的结构：

PartitionKey = ChannelUserId.PadWithLeadingZeros() + "-" + (int.MaxValue - NumberOfHits).PadWithLeadingZeros();
RowKey = Fan User Id;

您的查询将如下所示：

tableServiceContext = new CustomTableServiceContext(storageAccount.TableEndpoint.AbsoluteUri, storageAccount.Credentials);
tableServiceContext.RetryPolicy = RetryPolicies.Retry(3, TimeSpan.FromSeconds(1));
var results = (from i in tableServiceContext.CreateQuery<BiggestFansIndex>("BiggestFansIndex").AsTableServiceQuery<BiggestFansIndex>()
    where i.PartitionKey.CompareTo(UserId.PaddedWithLeadingZeros()) >= 0
        && i.PartitionKey.CompareTo((UserId + 1).PaddedWithLeadingZeros()) < 0
    select i}).Take(1).Execute();

我怀疑您最大的问题将是保持此索引表最新，因为我确信点击量会以合理的规律性发生变化。

There are 3 things in your query that are not supported by Azure Table Storage (AZT, my abbreviation, not generally used by others) querys.

Joins
Grouping
Aggregate functions

The short version is that if you want to run an efficient query in AZT then you need to run it against just one table and query against the partition key or partition key and row key.

This doesn't mean that your base data has to be stored in just this one table, you can keep the structure that you currently have, but you may need to build a table that is basically an index to allow you to get the info that you want. It might have a structure similar to this:

PartitionKey = ChannelUserId.PadWithLeadingZeros() + "-" + (int.MaxValue - NumberOfHits).PadWithLeadingZeros();
RowKey = Fan User Id;

Your query would then look something like this:

tableServiceContext = new CustomTableServiceContext(storageAccount.TableEndpoint.AbsoluteUri, storageAccount.Credentials);
tableServiceContext.RetryPolicy = RetryPolicies.Retry(3, TimeSpan.FromSeconds(1));
var results = (from i in tableServiceContext.CreateQuery<BiggestFansIndex>("BiggestFansIndex").AsTableServiceQuery<BiggestFansIndex>()
    where i.PartitionKey.CompareTo(UserId.PaddedWithLeadingZeros()) >= 0
        && i.PartitionKey.CompareTo((UserId + 1).PaddedWithLeadingZeros()) < 0
    select i}).Take(1).Execute();

Your biggest problem I suspect will be keeping this index table up to date as I'm sure hits will change with reasonable regularity.

回复收藏 0 原文

梦里兽 2025-01-11 13:23:05

Azure 表存储不适合此类聚合查询。我建议您研究一些 No-SQL 文档数据库，例如 CouchDB、MongoDB 和 RavenDB。但如果您仍然想使用它，则需要对数据进行非规范化。

回复收藏 0 原文

木緿 2025-01-11 13:23:05

其他人关于无法在 Azure 表中执行 JOIN 的说法是正确的。您可以将其移至 SQL Azure，其中 JOIN 按您的预期工作，但它比 Azure 表更昂贵且更慢。但是，假设您坚持使用 Azure 表：

在查看此特定查询时，您可以将 Hits 表的分区键设置为如下所示：

Hits 表：
PartitionKey = UserId（频道所有者的）
RowKey = 时间戳（或其他独特的东西）
UserId（执行点击的用户的）
频道ID
视频ID
（以及您想要在 Hits 表上显示的其他字段）

正如其他人所说，您无法对 Azure 表存储查询进行聚合，因此您必须将所有数据拉回本地内存（通过调用 Execute），然后您可以执行以下操作：内存中的聚合。以下是如何从表存储中提取数据（此查询在 Azure 表存储服务器上运行）：

var allHits = 
    (
      from h in tableServiceContext.CreateQuery("Hits")
        .AsTableServiceQuery()
        where h.PartitionKey == CurrentUserId  // The currently logged in user
    ).Execute();

然后是如何聚合它（此查询在本地内存中运行）：

var result = 
    (
      from h in allHits
      group h by h.UserId into g  // The User that performed the Hit
      select new BiggestFan { UserID = g.Key, Hits = g.Count() }
    )
    .OrderByDescending(b => b.Hits).FirstOrDefault();

这在技术上可行，但它不会' t 规模。一旦各种用户变得流行，将用户的所有点击拉入本地内存来运行此查询将是不切实际的。另外，一旦数据变得太大而无法一次全部拉取，您可能最终不得不对数据进行分页。

您可以进一步对数据进行非规范化，并随时计算和存储各种总计，以便当您需要运行此 Biggest-Fan 查询时，您需要检索的只是各种预先计算的总计。

然而，这只是一个查询。在设计 Azure 表结构时，您需要考虑可能要针对它们执行的所有查询、它们运行的频率以及它们将针对多少数据进行操作。然后，您可以找出 Azure 表中数据的最佳结构。我建议不要围绕单个查询设计 Azure 表，因为将来您可能需要更多查询。

What others have said about not being able to do JOINs in Azure Tables is correct. You can move it to SQL Azure where JOINs work as you expect, but it's far more expensive and slower than Azure tables. However, assuming you stick with Azure tables:

In looking at this specific query, you could set up the partition key for the Hits table to like this:

Hits Table:
PartitionKey = UserId (of the owner of the channel)
RowKey = Timestamp (or something else unique)
UserId (of the user that performed the hit)
ChannelID
VideoID
(and other fields you want on the Hits table)

As others have said, you can't do aggregation on Azure table storage queries, so you have to pull all the data back into local memory (by calling Execute), then you can do the aggregation in memory. Here is how to pull data from Table storage (this query is run on the Azure Table Storage server):

var allHits = 
    (
      from h in tableServiceContext.CreateQuery("Hits")
        .AsTableServiceQuery()
        where h.PartitionKey == CurrentUserId  // The currently logged in user
    ).Execute();

And then here is how you could aggregate it (this query is run in local memory):

var result = 
    (
      from h in allHits
      group h by h.UserId into g  // The User that performed the Hit
      select new BiggestFan { UserID = g.Key, Hits = g.Count() }
    )
    .OrderByDescending(b => b.Hits).FirstOrDefault();

This will technically work, but it won't scale. Once various users become popular, it will be impractical to pull down all of a user's hits into local memory to run this query. Plus you'll probably end up having to do paging on the data once it becomes too large to pull down all at once.

You could go further in denormalizing the data and calculate and store various totals as you go, so that when you need to run this Biggest-Fan query, all you need to retrieve are various pre-calculated totals.

However, this is only one query. When designing your Azure table structure, you need to consider all of the queries you might want to do against them, how often they will be run, and how much data they will be operating against. Then you can figure out the best structure for your data in Azure Tables. I would recommend against designing your Azure tables around a single query, as you will likely need more queries in the future.

回复收藏 0 原文

~没有更多了~