仓库增量负载;更新维度表

发布于 2024-10-06 21:34:02 字数 270 浏览 1 评论 0原文

我试图弄清楚如何在数据进入我们的系统时增量加载我的事实和维度表。

有没有比以下更简单的方法:

  • dim_id = select id from dim_table where dim_table.value='dim value';
  • 如果行数 == 0 ->插入到dim_table ...
  • 插入到fact(dim,measure)values(dim_id,23131)

如果我有10个维度加载会变得相当麻烦

I'm trying to figure out how to incrementally load my fact and dimension tables as data come into our system.

Is there any simpler way than:

  • dim_id = select id from dim_table where dim_table.value='dim value';
  • if rowcount == 0 -> insert into dim_table ...
  • insert into fact(dim, measure) values(dim_id, 23131)

if i got 10 dimensions the loading gets rather cumbersome

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

暖风昔人 2024-10-13 21:34:02

您真的需要它们是增量的吗?不能使用UUID吗?

我不明白为什么你需要一个 dim_table。

如果您正在使用星型架构,则可以通过以下方式使其发挥作用。

Fact_table
----------
time_id          character(36)
geographic_id    character(36)
measure          whatyouwant

Dim Time
--------
time_id    character(36) (That matches the time_id inside your fact table)
...
...

Dim Geogrphic
-------------
geographic_id character(36) (that matches the geographic_id inside your fact_Table)
....
....

当您填写维度表时,会为您的维度生成一个 uuid。
当您填写事实表时,请查找维度表,以便获取与该值关联的 uuid。

编辑:我认为这可能对您有帮助 http://dev .mysql.com/doc/refman/5.0/en/insert-on-duplicate.html

Do you really need them to be incremental ? Can't you use a UUID ?

I do not understand why you need a dim_table.

If you are doing a Star Schema, this is how you can make it work

Fact_table
----------
time_id          character(36)
geographic_id    character(36)
measure          whatyouwant

Dim Time
--------
time_id    character(36) (That matches the time_id inside your fact table)
...
...

Dim Geogrphic
-------------
geographic_id character(36) (that matches the geographic_id inside your fact_Table)
....
....

When you fill up your dimensions tables generate a uuid for your dimensions.
When you fille up your fact table, lookup into your dimension tables so you can get the uuid associated with this value.

Edit : I think that might help you http://dev.mysql.com/doc/refman/5.0/en/insert-on-duplicate.html

雅心素梦 2024-10-13 21:34:02

在标准星型模式中,维度在事实之前加载。
因此,如果存在批量运行的 ETL 流程,最简单的解决方案是尝试更频繁地运行批量。如果您确实需要实时 DW 加载,则必须以某种方式延迟事实,以确保首先加载维度。

当谈到现成的解决方案时,您可能会考虑流事件处理。有多种商业工具,但 EsperTech 是开源的。该系统可以设置为连续运行的 ETL 转换。想象一个持续运行的 Kettle(Talend、SSIS、..)转换。

In a standard star schema, dimensions are loaded before facts.
So, if there is an ETL process in place which runs in batches, the simplest solution is to try to run the batch more often. If you do need a real-time DW loading, the facts have to be delayed somehow, to make sure that dimensions are loaded first.

When it comes to off-the shelf solutions, you may look into stream-event processing. There are several commercial tools out there, but EsperTech is open source. The system can be set-up as an ETL transformation that runs continuously. Think of an Kettle (Talend, SSIS, ..) transformation which keeps running.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文