当前位置：文江博客话题详情

Data Vault 模型何时适合数据仓库模型？

发布于 2024-09-05 09:44:35 字数 201 浏览 13 评论 0原文

我最近发现了对“数据仓库建模”作为数据仓库模型的引用。我之前见过的模特是Inmon和Kimball。作者提到了由于需要连接而可能出现的性能问题。它看起来是一个不错的模型，但我想知道其中的陷阱。网上有经验报告吗？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

一花一树开 2024-09-12 09:44:35

多年来，我们一直在使用 Data Vault 的自行修改，称为“ Link Modelling'，只有实体和链接；从 Neo4j 汲取原理，但在 SQL 数据库中实现。

链接建模和数据仓库都是与 Kimball/Inmon 模型截然不同的思维方式。

我下面的评论涉及使用以下结构构建的系统：一个临时暂存数据库、一个 DWH，然后从 DWH 构建多个集市。还有其他方法来构建 DWH 解决方案，但这非常典型。

使用 Kimball/Inmon

数据在进入 DWH 的过程中被清理，但有时会在进入临时数据库的过程中应用
业务规则和 MDM（通常）应用于登台数据库和 DWH
集市通常是特定于主题领域的

通过数据仓库/链接建模

数据落地这些数据同样未经
清理地传递到 DWH，但以实体/链接形式存储。
数据清理、MDM 和业务规则在 DWH 和集市之间应用。
集市基于主题领域的特定需求（同上）。
对于我们来说，我们经常（但并非总是）构建 Kimball 星型模式风格的集市，因为最终用户可以轻松理解这些集市的数据结构。

链接建模 DWH 发挥作用的场合如下（使用 Kimball 术语来表达问题）

有时，用户会询问“为什么特定数字具有此功能”价值？'。在传统的 Kimball/Inmon 中，数据在传入的过程中就被清理了，无法知道原始值是什么。链接模型在 DWH 中具有原始数据。
当不存在链接多个维度的交易记录时，并且需要能够报告完整的数据集，因此例如提出诸如“特定经纪人出售的多少保单没有支付索赔交易”之类的问题？'。
在类型 2 Kimball 或 Inmon DWH 中应用 MDM 会导致大量类型 2 更改记录写入维度，维度通常包含所有数据值，因此存在大量重复数据。使用链接模型/数据仓库，新的维度值只会导致在链接表中创建新的类型 2 链接，该链接表仅具有实体表的外键。在 Kimball DWH 中，通常可以通过缓慢变化的维度和快速变化的维度，这是一个公平的解决方法。
在保险和其他需要能够生成“截至日期”报告的行业中，事实表也会慢慢变化，针对 2 类事实记录的 2 类维度跟踪是一场噩梦。
从开发的角度来看，向大型 Kimball 维度添加新列需要小心谨慎，并且考虑回填也很重要，但对于链接模型，向实体添加额外的列相对来说是微不足道的。

在金博尔方法论中，总有办法解决这些问题，但它们需要仔细思考，有时还需要一些突破。

从我们的角度来看，链接建模几乎没有什么缺点。

我与任何营销/生产 Kimball/Inmon 或 Data Vault 方法的公司都没有关系。

We have been using a home-grown modification to Data Vault for a number of years, called 'Link Modelling', which only has entities and links; drawing principles from neo4j, but implementing in a SQL database.

Both Link Modelling and Data Vault are very different ways of thinking to Kimball/Inmon models.

My comments below relate to a system built with the follow structure: a temporary staging database, a DWH, then a number of marts build from the DWH. There are other ways to architect a DWH solution, but this is quite typical.

With Kimball/Inmon

Data is cleaned on the way into the DWH, but sometimes applied on the way into the staging database
Business rules and MDM are (generally) applied between the staging db and the DWH
The marts are often subject area specific

With Data Vault/Link Modelling

Data is landed unchanged in staging
These data are passed through to the DWH also uncleaned, but stored in an entity/link form
Data cleansing, MDM and business rules are applied between the DWH and the marts.
Marts are based on subject area specific needs (same as above).
For us, we would often (but not always) build Kimball Star Schema style Marts, as the end users understand the data structures of these easily.

The occasions where a Link Modelled DWH comes into its own, are the following (using Kimball terminology to express the issues)

Upon occasion, there will be queries from the users asking 'why is a specific number having this value?'. In traditional Kimball/Inmon, data is cleansed on the way in, there is no way to know what the original value was. Link Model has the original data in the DWH.
When no transaction records exist that link a number of dimensions, and it is required to be able to report on the full set of data, so e.g. ask questions like 'How many insurance policies that were sold by a particular broker have no claim transactions paid?'.
The application of MDM in a type 2 Kimball or Inmon DWH can cause massive numbers of type 2 change records to be written to Dimensions, which often contain all the data values, so there is a lot of duplication of data. With a Link Model/Data Vault, a new dimensional value will just cause new type 2 links to be created in a link table, which only have foreign keys to entity tables. This is often overcome in Kimball DWH by having a slowly changing dimension and a fast changing dimension, which is a fair workaround.
In Insurance and other industries where there is the need to be able to produce 'as at date' reports, Fact tables will be slowly changing as well, type 2 dimension tracking against type 2 fact records are a nightmare.
From a development point of view, adding a new column to a large Kimball dimension needs to be done carefully and consideration of back-populating is important, but with a Link Model, adding an extra column to an Entity is relatively trivial.

There are always ways around these in Kimball methodology, but they require some careful thought and sometimes some jumping through hoops.

From our perspective, there is little downside to the Link Modelling.

I am not connected with any of the companies marketing/producing Kimball/Inmon or Data Vault methodologies.

回复收藏 0 原文