“每个 A 的前 5 个 B”的性能提示MDX 查询？（特别是SSAS）

发布于 2024-12-18 17:01:15 字数 1438 浏览 2 评论 0原文

“给我每个 A 的前 5 个 B”的典型 MDX 示例如下所示：

-- return top 5 clients for each firm
select
  [Measures].[Amount] on columns,
  NON EMPTY generate(
    [Firms].[Firm Name].Children,
    crossjoin(
      [Firms].[Firm Name].CurrentMember,
      TopCount([Clients].[Client Name].Children, 5, [Measures].[Amount])
    )
  )
  on rows
from [FirmsAndClients]

我正在设计一个 UI 原型，该 UI 会执行大量“前 5 个”类型查询，因此我正在寻找任何加快速度的提示特别是，尤其是当 crossjoin(A, B) 的度量值大部分为空时。

在这种特殊情况下，公司 x 客户很大（n=5000 xn=20,000，或多或少）并且稀疏，通过用过滤器替换 NON EMPTY（NOT ISEmpty），我能够将速度提高约 100 倍在交叉联接内部：

-- return top 5 clients for each firm
select
  [Measures].[Amount] on columns,
  generate(
    [Firms].[Firm Name].Children,
    crossjoin(
      [Firms].[Firm Name].CurrentMember,
      TopCount(
        filter([Clients].[Client Name].Children, NOT IsEmpty([Measures].[Amount]))
               5,
               [Measures].[Amount])
    )
  )
  on rows
from [FirmsAndClients]

我希望通过运行类似的查询来预热 SSAS 缓存，从而获得进一步的性能优势，但我发现了上述查询（即带有“filter”的查询）当针对热缓存和冷缓存运行时，速度同样慢。通过使用 Sql Profiler，我发现了部分原因：虽然 SSAS 缓存了底层多维数据集数据的部分，但它似乎并没有缓存整个查询的结果，也似乎没有缓存由生成或交叉连接创建的中间集。因此，每次重复查询时，它都必须重做生成、交叉连接和顶部计数。即使每个 topcount 看起来只需要 1 毫秒或更短时间，但当迭代数千家公司时，这些毫秒就会累加起来。

关于我可以在 MDX 级别、多维数据集级别或 Sql Server 调整级别做什么的任何提示？在 SQL 世界中，我可以通过在我想要排序的列上创建索引来获得一些好处。但据我所知，OLAP 还没有这样的东西。

可以想象，这是一个更普遍的问题，即当您想要加速涉及复杂排序和过滤的 MDX 查询时该怎么做（这显然不能 100% 适合 SSAS 缓存系统）。

原文

Typical MDX examples for "give me the top 5 Bs for each A" look like this:

-- return top 5 clients for each firm
select
  [Measures].[Amount] on columns,
  NON EMPTY generate(
    [Firms].[Firm Name].Children,
    crossjoin(
      [Firms].[Firm Name].CurrentMember,
      TopCount([Clients].[Client Name].Children, 5, [Measures].[Amount])
    )
  )
  on rows
from [FirmsAndClients]

I'm prototyping a UI that does a lot of these "top 5" type queries, so I'm looking for any tips to speed them up in particular, especially when crossjoin(A, B) has mostly null measure values.

In this particular case Firms x Clients was big (n=5000 x n=20,000, more or less) and sparse, and I was able to speed things up by a factor of about 100 by replacing the NON EMPTY with a filter(NOT ISEmpty) inside the crossjoin:

-- return top 5 clients for each firm
select
  [Measures].[Amount] on columns,
  generate(
    [Firms].[Firm Name].Children,
    crossjoin(
      [Firms].[Firm Name].CurrentMember,
      TopCount(
        filter([Clients].[Client Name].Children, NOT IsEmpty([Measures].[Amount]))
               5,
               [Measures].[Amount])
    )
  )
  on rows
from [FirmsAndClients]

I was hoping to get a further performance benefit from pre-warming the SSAS caches by running similar queries, but I've discovered the above query (i.e. the one with "filter") is equally slow when run against warm and cold caches. Playing with Sql Profiler, I've discovered part of why this might be: While SSAS is caching portions of the underlying cube data, it doesn't seem to cache the results from the query-as-a-whole, nor does it seem to cache the intermediate sets created by generate or by crossjoin. Thus it has to redo the generate and crossjoin and topcount each time I repeat the query. And even even though each topcount appears to take 1ms or less, those milliseconds add up when iterating across thousands of Firms.

Any tips on what I can do, at the MDX level or the cube level or the Sql Server tuning level? In the SQL world I could get some mileage out of creating an index on the columns I wanted to sort by. As far as I know there's nothing like that for OLAP, though.

Conceivably this is a more general question of what to do when you want to speed up an MDX query that involves complicated sorting and filtering (which apparently doesn't sit 100% well with the SSAS caching system).

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

将军与妓 2024-12-25 17:01:15

我已经使用了 SSAS 2008 更多的时间，根据经验，我为这些查询制定了一些速度原则：

空元组过滤可以有所帮助，但并不总是如此：正如我在原来的帖子中提到的，它可以加快速度将两个稀疏集连接在一起之前（而不是之后）过滤掉空元组。（我所说的“稀疏”是指当您交叉时，大多数测量值都为空。）如果您事先知道您要交叉的 MDX 集不是稀疏，那么 TopCount( NonEmpty）优化可能会适度减慢速度。
- 注意：如果您确实使用空元组过滤作为性能优化，则过滤的最佳位置似乎是在调用 TopCount 之前（即嵌套在其中）。
- 注意：在我原来的帖子中，我使用了模式 TopCount(Filter(... NOT IsEmpty(...)))。但可能会稍微快一点并且似乎与命名集更兼容的模式是使用 TopCount(NonEmpty(...), ....) 。
A) 使用命名集和 B) 在生成调用中使用 [MySet].Current 而不是 [My Dimension].[My Hierarchy].CurrentMember 的组合似乎也会带来适度的性能提升。
- 注意：之前我说过命名集给我带来了麻烦。不过，一旦我切换到 NonEmpty() 进行空元组过滤，我就没有遇到任何问题。

考虑到这一点，这就是我现在的查询。

WITH
SET [Set0] as [Firms].[Firm Name].Children
SET [Set1] as generate([Set0], crossjoin([Set0].Current, TopCount(NonEmpty([Clients].[Client Name].Children, [Measures].[Amount]), 5, [Measures].[Amount])))
SET [Set2] as generate([Set1], crossjoin([Set1].Current, TopCount([Time].[Year].Children, 5, [Measures].[Amount])))
SELECT
[Measures].[Amount] on columns,
NON EMPTY [Set2] on rows
FROM Lobbying2

在本例中，我对 Set1 使用 NonEmpty，因为 Firms x Clients 是稀疏的，但我不对 Set2 使用 NonEmpty，因为 FirmClientPairs x Time 并不稀疏。

我还注意到，使用命名集可以更轻松地构造嵌套的“前五名中的前五名中的前五名中的前五名......”类型查询，这很有趣。

I've played with SSAS 2008 some more, and from experience I've developed a couple principles of speed for these queries:

Empty tuple filtering can help, but it won't always: As I mentioned in my original post, it can speed things up an order of magnitude to filter out empty tuples before joining two sparse sets together rather than after. (By "sparse" I mean when you cross the two most of the measure values are null.) If you know in advance that the MDX sets you're crossing are not sparse, though, the TopCount(NonEmpty) optimization may slow things down moderately.
- Note: If you do use empty tuple filtering as a performance optimization, the best place to filter seems to be before (i.e. nested inside) the call to TopCount.
- Note: In my original post I used the pattern TopCount(Filter(... NOT IsEmpty(...))). But a pattern that might be slightly faster, and seems to be more compatible with named sets, is to use TopCount(NonEmpty(...), ....) instead.
The combination of A) using named sets and B) using [MySet].Current rather than than [My Dimension].[My Hierarchy].CurrentMember in your Generate call also seems to result in moderate performance gains.
- Note: Earlier I said named sets were giving me trouble. Once I switched to NonEmpty() for my empty tuple filtering, though, I had no problems with them.

With that in mind, here's what my queries are looking like now.

WITH
SET [Set0] as [Firms].[Firm Name].Children
SET [Set1] as generate([Set0], crossjoin([Set0].Current, TopCount(NonEmpty([Clients].[Client Name].Children, [Measures].[Amount]), 5, [Measures].[Amount])))
SET [Set2] as generate([Set1], crossjoin([Set1].Current, TopCount([Time].[Year].Children, 5, [Measures].[Amount])))
SELECT
[Measures].[Amount] on columns,
NON EMPTY [Set2] on rows
FROM Lobbying2

In this case I use NonEmpty for Set1, because Firms x Clients is sparse, but I don't use NonEmpty for Set2, because FirmClientPairs x Time is not sparse.

I also noticed that using named sets made it easier to construct nested "top-five-of-top-five-of-top-five..." type queries, which is kind of fun.

回复收藏 0 原文

甜味拾荒者 2024-12-25 17:01:15

如何提高以下MDX查询性能？

让我解释一下我为提高性能所遵循的步骤。

SELECT non empty([Measures].[HC_ALLOCATED_CNT_ASSIGNMENT]*[BI FCT PROJ ASSIGN].[ASSIGN CATEGORY 1].[ASSIGN CATEGORY 1].MEMBERS *[BI FCT PROJ ASSIGN].[ASSIGN CATEGORY 2].[ASSIGN CATEGORY 2].MEMBERS) ON AXIS(0),([BI DIM PROJECT].[PROJECT CODE].[PROJECT CODE].MEMBERS*[BI DIM PROJECT].[UNIT CODE].[UNIT CODE].MEMBERS*
        [BI DIM PROJECT].[SUBUNIT CODE].[SUBUNIT CODE].MEMBERS*
        [BI DIM PROJECT].[CURRENT PROJECT DESCRIPTION].[CURRENT PROJECT DESCRIPTION].MEMBERS*[BI DIM PROJECT].[CURRENT BILLING TYPE].[CURRENT BILLING TYPE].MEMBERS *[BI DIM PROJECT].[CURRENT MANAGED BY].[CURRENT MANAGED BY].MEMBERS *[BI DIM PROJECT].[CURRENT PROJECT OFFSHORE OU].[CURRENT PROJECT OFFSHORE OU].MEMBERS*[BI DIM PROJECT].[CURRENT ACCOUNT NAME].[CURRENT ACCOUNT NAME].MEMBERS*[BI DIM PROJECT].[CUSTOMER CODE].[CUSTOMER CODE].MEMBERS*     [BI DIM PROJECT].[CURRENT AD MAILID].[CURRENT AD MAILID].MEMBERS *[BI DIM PROJECT].[CURRENT AM MAILID].[CURRENT AM MAILID].MEMBERS *[BI DIM PROJECT].[CURRENT PD MAILID].[CURRENT PD MAILID].MEMBERS *[BI DIM PROJECT].[CURRENT PM MAILID].[CURRENT PM MAILID].MEMBERS *[BI DIM PROJECT].[CURRENT PROJECT STATUS].[CURRENT PROJECT STATUS].MEMBERS*[BI DIM ASSOCIATE].[FULLNAME].[FULLNAME])ON AXIS(1) FROM [CUBE_IAMIS]

How to improve the below MDX query perfomance?

Let me explain what are the steps i followed to improve the perfomance.

SELECT non empty([Measures].[HC_ALLOCATED_CNT_ASSIGNMENT]*[BI FCT PROJ ASSIGN].[ASSIGN CATEGORY 1].[ASSIGN CATEGORY 1].MEMBERS *[BI FCT PROJ ASSIGN].[ASSIGN CATEGORY 2].[ASSIGN CATEGORY 2].MEMBERS) ON AXIS(0),([BI DIM PROJECT].[PROJECT CODE].[PROJECT CODE].MEMBERS*[BI DIM PROJECT].[UNIT CODE].[UNIT CODE].MEMBERS*
        [BI DIM PROJECT].[SUBUNIT CODE].[SUBUNIT CODE].MEMBERS*
        [BI DIM PROJECT].[CURRENT PROJECT DESCRIPTION].[CURRENT PROJECT DESCRIPTION].MEMBERS*[BI DIM PROJECT].[CURRENT BILLING TYPE].[CURRENT BILLING TYPE].MEMBERS *[BI DIM PROJECT].[CURRENT MANAGED BY].[CURRENT MANAGED BY].MEMBERS *[BI DIM PROJECT].[CURRENT PROJECT OFFSHORE OU].[CURRENT PROJECT OFFSHORE OU].MEMBERS*[BI DIM PROJECT].[CURRENT ACCOUNT NAME].[CURRENT ACCOUNT NAME].MEMBERS*[BI DIM PROJECT].[CUSTOMER CODE].[CUSTOMER CODE].MEMBERS*     [BI DIM PROJECT].[CURRENT AD MAILID].[CURRENT AD MAILID].MEMBERS *[BI DIM PROJECT].[CURRENT AM MAILID].[CURRENT AM MAILID].MEMBERS *[BI DIM PROJECT].[CURRENT PD MAILID].[CURRENT PD MAILID].MEMBERS *[BI DIM PROJECT].[CURRENT PM MAILID].[CURRENT PM MAILID].MEMBERS *[BI DIM PROJECT].[CURRENT PROJECT STATUS].[CURRENT PROJECT STATUS].MEMBERS*[BI DIM ASSOCIATE].[FULLNAME].[FULLNAME])ON AXIS(1) FROM [CUBE_IAMIS]

回复收藏 0 原文

~没有更多了~