典型的 Kimball 星型数据仓库 - 模型视图可行吗? 以及如何编码生成

发布于 2024-07-06 13:41:46 字数 1289 浏览 7 评论 0原文

我有一个包含典型星型模式的数据仓库,以及一大堆执行类似操作的代码(显然要大得多,但这只是说明性的):

SELECT cdim.x
    ,SUM(fact.y) AS y
    ,dim.z
FROM fact
INNER JOIN conformed_dim AS cdim
    ON cdim.cdim_dim_id = fact.cdim_dim_id
INNER JOIN nonconformed_dim AS dim
    ON dim.ncdim_dim_id = fact.ncdim_dim_id
INNER JOIN date_dim AS ddim
    ON ddim.date_id = fact.date_id
WHERE fact.date_id = @date_id
GROUP BY cdim.x
    ,dim.z

我正在考虑用视图替换它(MODEL_SYSTEM_1,比如说),这样它就变成了:

SELECT m.x
    ,SUM(m.y) AS y
    ,m.z
FROM MODEL_SYSTEM_1 AS m
WHERE m.date_id = @date_id
GROUP BY m.x
    ,m.z

但是视图 MODEL_SYSTEM_1 必须包含唯一的列名,如果我继续这样做,我还担心优化器的性能,因为我担心 WHERE 子句中跨不同事实和维度的所有项目都会得到优化,因为视图将跨越整个星形,并且视图无法参数化(男孩,这不是很酷吗!)

所以我的问题是-

  1. 这种方法可以吗,或者它只是一种抽象,会损害性能并且除了更好的语法之外什么也没有给我?

  2. 考虑到所有适当的 PK 和 FK 都已就位,对这些视图进行代码生成、消除重复的列名称(即使视图稍后需要手动调整)的最佳方法是什么? 我是否应该编写一些 SQL 将其从 INFORMATION_SCHEMA 中提取出来,或者是否已经有一个很好的示例。

编辑:我已经测试过它,即使在更大的进程上,性能似乎也是一样的 - 甚至加入每个都使用这些视图的多个星。

自动化主要是因为数据仓库中有很多这样的明星,并且设计者已经正确完成了FK/PK,但我不想必须挑选所有的表格或文档。 我编写了一个脚本来生成视图(它还生成表的缩写),并且它可以很好地从 INFORMATION_SCHEMA 自动生成骨架,然后可以在提交视图创建之前对其进行调整。

如果有人想要代码,我可能可以在这里发布。

I have a data warehouse containing typical star schemas, and a whole bunch of code which does stuff like this (obviously a lot bigger, but this is illustrative):

SELECT cdim.x
    ,SUM(fact.y) AS y
    ,dim.z
FROM fact
INNER JOIN conformed_dim AS cdim
    ON cdim.cdim_dim_id = fact.cdim_dim_id
INNER JOIN nonconformed_dim AS dim
    ON dim.ncdim_dim_id = fact.ncdim_dim_id
INNER JOIN date_dim AS ddim
    ON ddim.date_id = fact.date_id
WHERE fact.date_id = @date_id
GROUP BY cdim.x
    ,dim.z

I'm thinking of replacing it with a view (MODEL_SYSTEM_1, say), so that it becomes:

SELECT m.x
    ,SUM(m.y) AS y
    ,m.z
FROM MODEL_SYSTEM_1 AS m
WHERE m.date_id = @date_id
GROUP BY m.x
    ,m.z

But the view MODEL_SYSTEM_1 would have to contain unique column names, and I'm also concerned about performance with the optimizer if I go ahead and do this, because I'm concerned that all the items in the WHERE clause across different facts and dimensions get optimized, since the view would be across a whole star, and views cannot be parametrized (boy, wouldn't that be cool!)

So my questions are -

  1. Is this approach OK, or is it just going to be an abstraction which hurts performance and doesn't give my anything but a lot nicer syntax?

  2. What's the best way to code-gen these views, eliminating duplicate column names (even if the view later needs to be tweaked by hand), given that all the appropriate PK and FKs are in place? Should I just write some SQL to pull it out of the INFORMATION_SCHEMA or is there a good example already available.

Edit: I have tested it, and the performance seems the same, even on the bigger processes - even joining multiple stars which each use these views.

The automation is mainly because there are a number of these stars in the data warehouse, and the FK/PK has been done properly by the designers, but I don't want to have to pick through all the tables or the documentation. I wrote a script to generate the view (it also generates abbreviations for the tables), and it works well to generate the skeleton automagically from INFORMATION_SCHEMA, and then it can be tweaked before committing the creation of the view.

If anyone wants the code, I could probably publish it here.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

无可置疑 2024-07-13 13:41:47
  1. 我已经在我管理的几个数据仓库中使用了这种技术。 与直接表方法相比,在基于视图运行报告时,我没有注意到任何性能下降,但从未执行过详细分析。

  2. 我使用 SQL Server Management Studio 中的设计器创建了视图,并且没有使用任何自动化方法。 我无法想象模式的变化足够频繁以至于自动化它无论如何都是值得的。 您可能会花费与首先将所有表格拖到视图上所需的时间一样长的时间来调整结果!

为了消除歧义,一个好的方法是在列名称前面加上它所属的维度的名称。 这对于报告编写者和任何运行临时查询的人都有帮助。

  1. I’ve used this technique on several data warehouses I look after. I have not noticed any performance degradation when running reports based off of the views versus a table direct approach but have never performed a detailed analysis.

  2. I created the views using the designer in SQL Server management studio and did not use any automated approach. I can’t imagine the schema changing often enough that automating it would be worthwhile anyhow. You might spend as long tweaking the results as it would have taken to drag all the tables onto the view in the first place!

To remove ambiguity a good approach is to preface the column names with the name of the dimension it belongs to. This is helpful to the report writers and to anyone running ad hoc queries.

凉栀 2024-07-13 13:41:47

将一个或多个视图制作成一个或多个汇总事实表并将其具体化。 这些只需要在刷新主事实表时刷新。 物化视图的查询速度会更快,如果您有很多查询可以通过摘要来满足,那么这可能是一个胜利。

如果您有大量这些摘要或希望经常更改它们,您可以使用数据字典或信息架构视图生成 SQL 来创建表。

但是,我猜想您不太可能经常更改这些内容,因此自动生成视图定义可能不值得这么麻烦。

Make the view or views into into one or more summary fact tables and materialize it. These only need to be refreshed when the main fact table is refreshed. The materialized views will be faster to query and this can be a win if you have a lot of queries that can be satisfied by the summary.

You can use the data dictionary or information schema views to generate SQL to create the tables if you have a large number of these summaries or wish to change them about frequently.

However, I would guess that it's not likely that you would change these very often so auto-generating the view definitions might not be worth the trouble.

画中仙 2024-07-13 13:41:47

如果您碰巧使用 MS SQL Server,您可以尝试使用内联 UDF,它与 参数化视图 就这样。

If you happen to use MS SQL Server, you could try an Inline UDF which is as close to a parameterized view as it gets.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文