如何避免星型模式中的复杂联接?

发布于 2024-09-07 11:03:14 字数 149 浏览 12 评论 0原文

我的事实表保存了用户在他参加的课程中的分数。我必须在报告中显示的课程的一些详细信息来自多个表(在实际的 OLTP 数据库中)。
我是否在维度表中创建该课程条目的非标准化版本?
或者我只是将事实表直接连接到课程表,连接到描述该课程的其他表(课程类型、创建该课程的教师等)

My fact table holds a user score in a course he took. Some of the details of the course, which I have to show on the report, comes from more then one table (in the actual OLTP db).
Do I create a none normalized version of that course entry in a dimension table?
Or do I just join the fact table directly to the course table join to the other tables that describe this course (course_type,faculty who created this course etc)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

分分钟 2024-09-14 11:03:14

雪花表或桥接表确实使联接变得更加复杂,不仅从编码的角度来看,它还使 BI 用户变得不那么简单。

在大多数情况下,我会将它们直接放入现有或附加维度表中。

例如,您有一个分数事实表,其中在某个维度中包含用户详细信息,该维度可能包含也可能不包含用户的人口统计数据(也许它只是一个桥梁)。有时最好将人口统计信息分开。因此,即使性别和年龄可能与用户实体相关联,但在维度模型中,这些可能是单独的维度或集中到单个维度中 - 一切都取决于使用场景。

也许你的分数附属于一个州,而州有区域(雪花)。直接链接区域维度而不是通过州维度进行分析可能会更有效。

我想你会发现维度模型是一种非常实用的非规范化方法。不可协商的主要事情是事实——之后维度的选择很大程度上取决于数据的行为、您对常见使用场景的预见——并避免陷入维度太少和维度太多的问题。

Snowflaking or bridge tables do make the joins more complicated, and not just from a coding perspective, it also makes it less simple for BI users.

In most cases, I would put these directly in existing or additional dimension tables.

For instance, you have a scores fact table, which has the user details in a dimension which may or may not hold demographics on the user (perhaps it's only a bridge). Sometimes it is better to split out demographic information. So even though the gender and age might be associated with a user entity, in the dimensional model, these might be individual dimensions or lumped into a single dimension - all depending on the usage scenarios.

Perhaps your scores are attached to a state and states have regions (snowflake). It might be far more efficient for analysis to have the region dimension linked directly instead of going through the state dimension.

I think what you will find is that the dimensional model is a very pragmatic denormalization approach. The main things which are non-negotiable are the facts - after that the choice of dimensions is very much informed by the behavior of the data, your foresight for common usage scenarios - and avoiding falling into the too few dimensions and too many dimensions problems.

酷炫老祖宗 2024-09-14 11:03:14

也许我不明白你的问题,但星型模式中的事实表应该连接到它周围的维度表。
如果您不想进行联接,只需创建一个视图,然后使用该视图进行报告即可。

如果您要发布模型(架构),则评论/帮助会更容易。

Maybe I do not understand your question, but a fact table in a star schema is supposed to be joined to dimension tables surrounding it.
If you do not feel like making joins, simply create a view, and use the view for reporting.

If you were to post a model (schema), it would be easier to comment/help.

转身泪倾城 2024-09-14 11:03:14

将多个维度合并在一起是一种常见的做法,为了性能而牺牲标准化。当您的典型查询需要所有维度在一起时(而不是针对不同用例使用不同的位),通常会执行此操作。

记住,虽然连接开销有所减少,但也存在一些缺点:

  • 灵活性损失,随着仓库的扩展,这可能会阻碍开发
  • 全表扫描需要更长的时间(在传统的基于行的 RDBMS 中,例如 SQL Server)
  • 磁盘空间消耗

另请 必须单独考虑每种情况。

如果您的 RDBMS 提供了创建物化视图的功能,那么也可能值得考虑。

It is a common practice to consolidate several dimensions together, sacrificing normalization in favor of performance. This is usually done when your typical query will need all dimensions together (as opposed to using different bits for different use cases).

Also remember that while you receive a reduction in join overhead, there are some drawbacks:

  • Loss of flexibility, which might hinder development as the warehouse expands
  • Full table scans take longer (in traditional row-based RDBMS such as SQL Server)
  • Disk space consumption

You will have to consider each case separately.

It might be worthwhile to also consider the option of creating a materialized view, if such ability is offered by your RDBMS.

兔姬 2024-09-14 11:03:14

我们通常使用雪花模式作为物理 DWH 设计,但添加一个报告视图层,将雪花模式展平为星型模式。

这样,您的 OLAP 多维数据集就会变得更加简单并且更易于管理。

We commonly have a snowflake schema as the physical DWH design, but add a reporting view layer that flattens the snowflake schema into a star schema.

This way your OLAP cube becomes much simpler adn easier to manage.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文