插入星型模式
我读过很多关于星型模式、事实/定义表、选择语句来快速报告数据的内容,但是将数据输入星型模式的问题对我来说似乎很遥远。 “理论上”如何将数据输入星型数据库?同时维护事实表。是带有 20 个参数的巨型存储过程中的一系列 INSERT INTO 语句,这是我唯一的选择(以及如何填充事实表)。 非常感谢。
I've read a lot about star-schema's, about fact/deminsion tables, select statements to quickly report data, however the matter of data entry into a star-schema seems aloof to me. How does one "theoretically" enter data into a star-schema db? while maintaining the fact table. Is a series of INSERT INTO statement within giant stored proc with 20 params my only option (and how to populate the fact table).
Many thanks.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
首先从尺寸开始——一一尺寸。使用 ECCD(提取、清理、整合、交付)方法。
确保每个维度都有一个 BusinessKey 来唯一标识维度行描述的“对象”,例如一个人的电子邮件。
加载维度后,准备键查找管道。一般来说,对于每个维度表,您可以准备一个键查找表(BusinessKey、PrimaryKey)。一些设计人员选择直接查找维度表,但键查找通常可以轻松缓存到内存中,从而加快事实加载速度。
也将 ECCD 用于事实数据。 ECC 部分发生在暂存区域,您可以根据需要为 ECC 的每个步骤选择(辅助)表或平面文件。
交付事实表时,将事实行中的每个 BusinessKey 替换为从键查找表中获取的匹配 PrimaryKey。将所有 BusinessKey 替换为其匹配的 PrimaryKey 后,将该行插入到事实表中。
不要浪费你的时间,使用 ETL 工具。您可以免费下载 Pentaho Kettle(社区版)——它拥有实现这一目标所需的一切。
Start with dimensions first -- one by one. Use ECCD (Extract, Clean, Conform, Deliver) approach.
Make sure that each dimension has a BusinessKey that uniquely identifies the "object" that a dimension row describes -- like email for a person.
With dimensions loaded, prepare key-lookup pipeline. In general, for each each dimension table you can prepare a key lookup table (BusinessKey, PrimaryKey). Some designers choose to lookup the dimension table directly, but the key-lookup can be often easily cached into memory which results in faster fact loading.
Use ECCD for fact data too. The ECC part happens in the staging area, you can choose (helper) tables or flat files for each step of the ECC, as you prefer.
While delivering fact tables, replace each BusinessKey in the fact row with the matching PrimaryKey that you get from a key-lookup table. Once all BusinessKeys are replaced with their matching PrimaryKeys, insert the row into the fact table.
Do not waste you time, use ETL tool. You can download Pentaho Kettle (community edition) for free -- it has everything one needs to achieve this.
您通常不会以与普通表单相同的方式将数据插入星型模式 - 即使用在单个事务中插入/更新所有适当表的存储过程。请记住,星型模式通常是一种只读的非规范化数据模型 - 它(很少)以事务方式处理,并且通常从已经非规范化的平面数据中加载 - 通常每个星形一个平面文件。
正如 Damir 指出的那样,通常,您加载所有维度(处理缓慢变化等),然后加载事实,加入适当的当前维度以查找维度 ID(使用业务键)。
You typically do not insert data into a star schema in the same way you might into a normal form - i.e. with a stored procedure which inserts/updated all the appropriate tables within a single transaction. Remember that the star schema is typically a read-only denormalized model of data - it is (rarely) treated transactionally, and is typically loaded from data that is already denormalized flat - usually one flat file per star.
As Damir points out, typically, you load all the dimensions (handle the slowly changing etc), then load the facts, joining to the appropriate current dimensions to find the dimension IDs (using the business keys).