从事务性平面数据库填充事实表和维度表的最佳实践

发布于 2024-08-26 00:18:46 字数 192 浏览 11 评论 0原文

我想在 SSIS / SSAS 中填充星型模式/多维数据集。

我准备了所有维度表和事实表、主键等。

源是一个“平面”（项目级别）表，我现在的问题是如何拆分它并将其从一个放入相应的表中。

我做了一些谷歌搜索，但找不到令人满意的解决方案。人们会认为这是 BI 开发中相当常见的问题/情况？！

谢谢，亚历克斯

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

一杯敬自由 2024-09-02 00:18:46

首先，这取决于您想要执行简单的初始数据传输还是更复杂的操作（例如增量）。我假设您正在进行初始数据传输。

假设您的项目表具有如下列：id, cat1, cat2, cat3, cat4, ... 假设类别 1-4 有列 id, cat_name，您可以加载dim_cat1（项目类别 1 的维度表）如下：

insert into dim_cat1 (cat_name)
  select distinct cat1 from item_table;

您可以对所有其他类别/维度表执行相同的操作。我假设您的维度表已自动生成 ID。现在，加载事实表：

insert into fact_table (id, cat1_id, cat2_id, cat3_id, cat4_id, ...)
  select id, dc1.id
    from item_table it
      join dim_cat1 dc1 on dc1.cat_name = it.cat1
      join dim_cat2 dc2 on dc2.cat_name = it.cat2
      join dim_cat3 dc3 on dc3.cat_name = it.cat3
      join dim_cat4 dc3 on dc4.cat_name = it.cat4
 ...

如果您有大量数据，那么在 item_table 和维度表中的类别名称上创建索引可能是有意义的。

顺便说一句，这是一个独立于数据库的答案，我不使用 SSIS/SSAS：您可能有可用的工具可以为您简化此过程的部分内容，但用纯 SQL 编写实际上并不困难/耗时。

For a start, it depends on whether you want to do a simple initial data transfer or something more sophisticated (e.g. incremental). I'm going to assume you're doing an initial data transfer.

Say your item table has columns as follows: id, cat1, cat2, cat3, cat4, ... Assuming categories 1-4 have columns id, cat_name, you can load dim_cat1 (the dimension table of item category 1) as follows:

insert into dim_cat1 (cat_name)
  select distinct cat1 from item_table;

You can do the same for all of the other categories/dimension tables. I'm assuming your dimension tables have automatically generated IDs. Now, to load the fact table:

insert into fact_table (id, cat1_id, cat2_id, cat3_id, cat4_id, ...)
  select id, dc1.id
    from item_table it
      join dim_cat1 dc1 on dc1.cat_name = it.cat1
      join dim_cat2 dc2 on dc2.cat_name = it.cat2
      join dim_cat3 dc3 on dc3.cat_name = it.cat3
      join dim_cat4 dc3 on dc4.cat_name = it.cat4
 ...

If you have a substantial amount of data, it might make sense to create indexes on the category names in the item_table and maybe the dimension tables.

Btw, this is a database-independent answer, I don't work with SSIS/SSAS: you might have tools available which streamline parts of this process for you, but it's really not that difficult/time consuming to write in plain SQL.

回复收藏 0 原文