从事务性平面数据库填充事实表和维度表的最佳实践
我想在 SSIS / SSAS 中填充星型模式/多维数据集。
我准备了所有维度表和事实表、主键等。
源是一个“平面”(项目级别)表,我现在的问题是如何拆分它 并将其从一个放入相应的表中。
我做了一些谷歌搜索,但找不到令人满意的解决方案。人们会认为这是 BI 开发中相当常见的问题/情况?!
谢谢, 亚历克斯
I want to populate a star schema / cube in SSIS / SSAS.
I prepared all my dimension tables and my fact table, primary keys etc.
The source is a 'flat' (item level) table and my problem is now how to split it
up and get it from one into the respective tables.
I did a fair bit of googling but couldn't find a satisfying solution to the problem. One would imagine that this is a rather common problem/situation in BI development?!
Thanks,
alexl
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
首先,这取决于您想要执行简单的初始数据传输还是更复杂的操作(例如增量)。我假设您正在进行初始数据传输。
假设您的项目表具有如下列:
id, cat1, cat2, cat3, cat4, ...
假设类别 1-4 有列id, cat_name
,您可以加载dim_cat1(项目类别 1 的维度表)如下:您可以对所有其他类别/维度表执行相同的操作。我假设您的维度表已自动生成 ID。现在,加载事实表:
如果您有大量数据,那么在 item_table 和维度表中的类别名称上创建索引可能是有意义的。
顺便说一句,这是一个独立于数据库的答案,我不使用 SSIS/SSAS:您可能有可用的工具可以为您简化此过程的部分内容,但用纯 SQL 编写实际上并不困难/耗时。
For a start, it depends on whether you want to do a simple initial data transfer or something more sophisticated (e.g. incremental). I'm going to assume you're doing an initial data transfer.
Say your item table has columns as follows:
id, cat1, cat2, cat3, cat4, ...
Assuming categories 1-4 have columnsid, cat_name
, you can load dim_cat1 (the dimension table of item category 1) as follows:You can do the same for all of the other categories/dimension tables. I'm assuming your dimension tables have automatically generated IDs. Now, to load the fact table:
If you have a substantial amount of data, it might make sense to create indexes on the category names in the item_table and maybe the dimension tables.
Btw, this is a database-independent answer, I don't work with SSIS/SSAS: you might have tools available which streamline parts of this process for you, but it's really not that difficult/time consuming to write in plain SQL.
我们通过使用数据流任务将自上次包执行时间以来的信息复制到临时暂存表中,然后根据键使用这些暂存表中的数据更新存档/仓库,然后插入尚不存在的行。截断暂存表,为下次做好准备,添加审计负载。工作完成了吗?
We do this by using a dataflow task to copy information since the last package execution time into a temp staging tables, then update the archive/warehouse with data from those staging tables based on a key, then insert those rows which don't exist yet. Truncate the staging table ready for next time, add a load of auditing. Job Done?
我经常针对操作数据存储而不是星型模式构建多维数据集。星型模式的性能几乎总是更好,但对于原型设计/测试,不要害怕针对您没有的数据开发多维数据集,而不是您想要的星型模式。
I frequently build cubes against Operational Data Stores instead of star schemas.. Performance will almost always be better with a Star Schema, but for prototyping / testing, don't be afraid to develop cubes against the data you HAVE not the star schema you WANT.