典型的 Kimball 星型数据仓库 - 模型视图可行吗？以及如何编码生成

发布于 2024-07-06 13:41:46 字数 1289 浏览 7 评论 0原文

我有一个包含典型星型模式的数据仓库，以及一大堆执行类似操作的代码（显然要大得多，但这只是说明性的）：

SELECT cdim.x
    ,SUM(fact.y) AS y
    ,dim.z
FROM fact
INNER JOIN conformed_dim AS cdim
    ON cdim.cdim_dim_id = fact.cdim_dim_id
INNER JOIN nonconformed_dim AS dim
    ON dim.ncdim_dim_id = fact.ncdim_dim_id
INNER JOIN date_dim AS ddim
    ON ddim.date_id = fact.date_id
WHERE fact.date_id = @date_id
GROUP BY cdim.x
    ,dim.z

我正在考虑用视图替换它（MODEL_SYSTEM_1，比如说），这样它就变成了：

SELECT m.x
    ,SUM(m.y) AS y
    ,m.z
FROM MODEL_SYSTEM_1 AS m
WHERE m.date_id = @date_id
GROUP BY m.x
    ,m.z

但是视图 MODEL_SYSTEM_1 必须包含唯一的列名，如果我继续这样做，我还担心优化器的性能，因为我担心 WHERE 子句中跨不同事实和维度的所有项目都会得到优化，因为视图将跨越整个星形，并且视图无法参数化（男孩，这不是很酷吗！）

所以我的问题是-

这种方法可以吗，或者它只是一种抽象，会损害性能并且除了更好的语法之外什么也没有给我？
考虑到所有适当的 PK 和 FK 都已就位，对这些视图进行代码生成、消除重复的列名称（即使视图稍后需要手动调整）的最佳方法是什么？我是否应该编写一些 SQL 将其从 INFORMATION_SCHEMA 中提取出来，或者是否已经有一个很好的示例。

编辑：我已经测试过它，即使在更大的进程上，性能似乎也是一样的 - 甚至加入每个都使用这些视图的多个星。

自动化主要是因为数据仓库中有很多这样的明星，并且设计者已经正确完成了FK/PK，但我不想必须挑选所有的表格或文档。我编写了一个脚本来生成视图（它还生成表的缩写），并且它可以很好地从 INFORMATION_SCHEMA 自动生成骨架，然后可以在提交视图创建之前对其进行调整。

如果有人想要代码，我可能可以在这里发布。

原文

I have a data warehouse containing typical star schemas, and a whole bunch of code which does stuff like this (obviously a lot bigger, but this is illustrative):

SELECT cdim.x
    ,SUM(fact.y) AS y
    ,dim.z
FROM fact
INNER JOIN conformed_dim AS cdim
    ON cdim.cdim_dim_id = fact.cdim_dim_id
INNER JOIN nonconformed_dim AS dim
    ON dim.ncdim_dim_id = fact.ncdim_dim_id
INNER JOIN date_dim AS ddim
    ON ddim.date_id = fact.date_id
WHERE fact.date_id = @date_id
GROUP BY cdim.x
    ,dim.z

I'm thinking of replacing it with a view (MODEL_SYSTEM_1, say), so that it becomes:

SELECT m.x
    ,SUM(m.y) AS y
    ,m.z
FROM MODEL_SYSTEM_1 AS m
WHERE m.date_id = @date_id
GROUP BY m.x
    ,m.z

But the view MODEL_SYSTEM_1 would have to contain unique column names, and I'm also concerned about performance with the optimizer if I go ahead and do this, because I'm concerned that all the items in the WHERE clause across different facts and dimensions get optimized, since the view would be across a whole star, and views cannot be parametrized (boy, wouldn't that be cool!)

So my questions are -

Is this approach OK, or is it just going to be an abstraction which hurts performance and doesn't give my anything but a lot nicer syntax?
What's the best way to code-gen these views, eliminating duplicate column names (even if the view later needs to be tweaked by hand), given that all the appropriate PK and FKs are in place? Should I just write some SQL to pull it out of the INFORMATION_SCHEMA or is there a good example already available.

Edit: I have tested it, and the performance seems the same, even on the bigger processes - even joining multiple stars which each use these views.

The automation is mainly because there are a number of these stars in the data warehouse, and the FK/PK has been done properly by the designers, but I don't want to have to pick through all the tables or the documentation. I wrote a script to generate the view (it also generates abbreviations for the tables), and it works well to generate the skeleton automagically from INFORMATION_SCHEMA, and then it can be tweaked before committing the creation of the view.

If anyone wants the code, I could probably publish it here.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

无可置疑 2024-07-13 13:41:47

我已经在我管理的几个数据仓库中使用了这种技术。与直接表方法相比，在基于视图运行报告时，我没有注意到任何性能下降，但从未执行过详细分析。
我使用 SQL Server Management Studio 中的设计器创建了视图，并且没有使用任何自动化方法。我无法想象模式的变化足够频繁以至于自动化它无论如何都是值得的。您可能会花费与首先将所有表格拖到视图上所需的时间一样长的时间来调整结果！

为了消除歧义，一个好的方法是在列名称前面加上它所属的维度的名称。这对于报告编写者和任何运行临时查询的人都有帮助。

回复收藏 0 原文