“每个 A 的前 5 个 B”的性能提示MDX 查询? (特别是SSAS)
“给我每个 A 的前 5 个 B”的典型 MDX 示例如下所示:
-- return top 5 clients for each firm
select
[Measures].[Amount] on columns,
NON EMPTY generate(
[Firms].[Firm Name].Children,
crossjoin(
[Firms].[Firm Name].CurrentMember,
TopCount([Clients].[Client Name].Children, 5, [Measures].[Amount])
)
)
on rows
from [FirmsAndClients]
我正在设计一个 UI 原型,该 UI 会执行大量“前 5 个”类型查询,因此我正在寻找任何加快速度的提示特别是,尤其是当 crossjoin(A, B) 的度量值大部分为空时。
在这种特殊情况下,公司 x 客户很大(n=5000 xn=20,000,或多或少)并且稀疏,通过用过滤器替换 NON EMPTY(NOT ISEmpty),我能够将速度提高约 100 倍在交叉联接内部:
-- return top 5 clients for each firm
select
[Measures].[Amount] on columns,
generate(
[Firms].[Firm Name].Children,
crossjoin(
[Firms].[Firm Name].CurrentMember,
TopCount(
filter([Clients].[Client Name].Children, NOT IsEmpty([Measures].[Amount]))
5,
[Measures].[Amount])
)
)
on rows
from [FirmsAndClients]
我希望通过运行类似的查询来预热 SSAS 缓存,从而获得进一步的性能优势,但我发现了上述查询(即带有“filter”的查询)当针对热缓存和冷缓存运行时,速度同样慢。通过使用 Sql Profiler,我发现了部分原因:虽然 SSAS 缓存了底层多维数据集数据的部分,但它似乎并没有缓存整个查询的结果,也似乎没有缓存由生成或交叉连接创建的中间集。因此,每次重复查询时,它都必须重做生成、交叉连接和顶部计数。即使每个 topcount 看起来只需要 1 毫秒或更短时间,但当迭代数千家公司时,这些毫秒就会累加起来。
关于我可以在 MDX 级别、多维数据集级别或 Sql Server 调整级别做什么的任何提示?在 SQL 世界中,我可以通过在我想要排序的列上创建索引来获得一些好处。但据我所知,OLAP 还没有这样的东西。
可以想象,这是一个更普遍的问题,即当您想要加速涉及复杂排序和过滤的 MDX 查询时该怎么做(这显然不能 100% 适合 SSAS 缓存系统)。
Typical MDX examples for "give me the top 5 Bs for each A" look like this:
-- return top 5 clients for each firm
select
[Measures].[Amount] on columns,
NON EMPTY generate(
[Firms].[Firm Name].Children,
crossjoin(
[Firms].[Firm Name].CurrentMember,
TopCount([Clients].[Client Name].Children, 5, [Measures].[Amount])
)
)
on rows
from [FirmsAndClients]
I'm prototyping a UI that does a lot of these "top 5" type queries, so I'm looking for any tips to speed them up in particular, especially when crossjoin(A, B) has mostly null measure values.
In this particular case Firms x Clients was big (n=5000 x n=20,000, more or less) and sparse, and I was able to speed things up by a factor of about 100 by replacing the NON EMPTY with a filter(NOT ISEmpty) inside the crossjoin:
-- return top 5 clients for each firm
select
[Measures].[Amount] on columns,
generate(
[Firms].[Firm Name].Children,
crossjoin(
[Firms].[Firm Name].CurrentMember,
TopCount(
filter([Clients].[Client Name].Children, NOT IsEmpty([Measures].[Amount]))
5,
[Measures].[Amount])
)
)
on rows
from [FirmsAndClients]
I was hoping to get a further performance benefit from pre-warming the SSAS caches by running similar queries, but I've discovered the above query (i.e. the one with "filter") is equally slow when run against warm and cold caches. Playing with Sql Profiler, I've discovered part of why this might be: While SSAS is caching portions of the underlying cube data, it doesn't seem to cache the results from the query-as-a-whole, nor does it seem to cache the intermediate sets created by generate or by crossjoin. Thus it has to redo the generate and crossjoin and topcount each time I repeat the query. And even even though each topcount appears to take 1ms or less, those milliseconds add up when iterating across thousands of Firms.
Any tips on what I can do, at the MDX level or the cube level or the Sql Server tuning level? In the SQL world I could get some mileage out of creating an index on the columns I wanted to sort by. As far as I know there's nothing like that for OLAP, though.
Conceivably this is a more general question of what to do when you want to speed up an MDX query that involves complicated sorting and filtering (which apparently doesn't sit 100% well with the SSAS caching system).
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
我已经使用了 SSAS 2008 更多的时间,根据经验,我为这些查询制定了一些速度原则:
考虑到这一点,这就是我现在的查询。
在本例中,我对 Set1 使用 NonEmpty,因为 Firms x Clients 是稀疏的,但我不对 Set2 使用 NonEmpty,因为 FirmClientPairs x Time 并不稀疏。
我还注意到,使用命名集可以更轻松地构造嵌套的“前五名中的前五名中的前五名中的前五名......”类型查询,这很有趣。
I've played with SSAS 2008 some more, and from experience I've developed a couple principles of speed for these queries:
With that in mind, here's what my queries are looking like now.
In this case I use NonEmpty for Set1, because Firms x Clients is sparse, but I don't use NonEmpty for Set2, because FirmClientPairs x Time is not sparse.
I also noticed that using named sets made it easier to construct nested "top-five-of-top-five-of-top-five..." type queries, which is kind of fun.
如何提高以下MDX查询性能?
让我解释一下我为提高性能所遵循的步骤。
How to improve the below MDX query perfomance?
Let me explain what are the steps i followed to improve the perfomance.