数据仓库事实表问题

发布于 2024-09-24 13:23:45 字数 287 浏览 2 评论 0原文

我有一个名为“贷款”的事实表。在此表中,有贷款标识符、贷款日期和贷款金额。

我不太知道如何在数据仓库中执行的业务需求如下。未来贷款金额可以调整。假设 8 月 1 日我们发放了一笔贷款,贷款 ID 为 1,金额为 20,000。 10月1日这笔贷款调整为19000。我是否在事实表中放置了两个具有相同标识符、不同日期和金额的条目?

我是否要创建一个新表(维度表),其中包含贷款金额和日期?实现这个场景的最佳方法是什么?

在生产数据库中,我们有一个贷款总额表,然后有一个贷款金额表,因此贷款金额的组合等于总贷款金额。

I have a fact table called Loans. In this table in a loan identifier, the date the loan was made, and the amount of the loan.

The business requirement I don't quite know how to do in the datawarehouse is the following. The loan amount can be adjusted in the future. So lets say August 1st we make a loan with an id of 1 and a amount of 20,000. October 1st this loan is adjusted to 19,000. Do I put two entries in the fact table with the same identifier, and different dates and amounts?

Do I create a new table (dimension table) with loan amount and date in it? What is the best way to do this scenario?

In the production database we have a table for total loan amount, and then a table off of that for loanAmounts so a combination of loanAmounts equals totalLoanAmount.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

金橙橙 2024-10-01 13:23:45

我总是建议将此类事实表视为完全可总结的,具有任何提款、付款、利息等的有效移动交易 - 如果你这样做,你可以拥有绝对的灵活性来随时间报告,按交易类型过滤等 - 存储多个当您跨维度合并不同指标时,版本同一事实表中的行或更新快照表可能是其他报告选项,但在基本级别通常逐项列出这些事件。

中号

I would always recommend treating such fact tables as fully-summable, having what are effectively movement transactions for any uplift, payment, interest etc - if you do this you can have absolute flexibility to report over time, filter by transaction type etc - storing multiple version rows in the same fact table or updating a snapshot table maybe other options for reporting when you are amalgamating different metrics across dimensions but at the base level it is usual to have these line by line events itemised.

M

冰魂雪魄 2024-10-01 13:23:45

将 LoanID 视为退化维度并单独输入校正。
您还可以使用交易类型维度来描述:贷款、付款、更正等。

DateKey CustomerKey TransacionTypeKey LoanID    Amount
---------------------------------------------------------
20100801    175          1               1     20000.00
20101001    175          2               1    - 1000.00

按客户、贷款显示所有贷款

select
      CustomerKey
    , LoanID
    , sum(Amount) as Amt
from factLoan
group by CustomerKey, LoanID
order by CustomerKey, LoanID ;

Treat the LoanID as a degenerate dimension and enter the correction separately.
You may also use a transaction type dimension to describe: loans, payments, corrections etc.

DateKey CustomerKey TransacionTypeKey LoanID    Amount
---------------------------------------------------------
20100801    175          1               1     20000.00
20101001    175          2               1    - 1000.00

Show all loans by customer, by loan

select
      CustomerKey
    , LoanID
    , sum(Amount) as Amt
from factLoan
group by CustomerKey, LoanID
order by CustomerKey, LoanID ;
一个人的旅程 2024-10-01 13:23:45

这称为缓慢变化的维度,它取决于业务需求、天气或您是否想要跟踪该行中发生的变化。
通常有 3 种缓慢变化的维度。

在类型 1 缓慢变化的维度中,新信息只是覆盖原始信息。换句话说,没有保留任何历史记录。

在类型 2 缓慢变化的维度中,表中添加一条新记录来表示新信息。因此,原始记录和新记录都会出现。新记录有自己的主键。

在类型 3 缓慢变化的维度中,将有两列指示感兴趣的特定属性,一列指示原始值,一列指示当前值。还有一列指示当前值何时变为活动状态。

您可以参考 Kimball 手册以获取有关缓慢变化尺寸的更多信息。

this is called slowly changing dimensions, and it depends on the business requirements, weather or not you want to keep trace of the changes that occurred in that row or not.
There is typically 3 types of slowly changing dimensions.

In Type 1 Slowly Changing Dimension, the new information simply overwrites the original information. In other words, no history is kept.

In Type 2 Slowly Changing Dimension, a new record is added to the table to represent the new information. Therefore, both the original and the new record will be present. The new record gets its own primary key.

In Type 3 Slowly Changing Dimension, there will be two columns to indicate the particular attribute of interest, one indicating the original value, and one indicating the current value. There will also be a column that indicates when the current value becomes active.

you can refer to kimball manual for more information about slowly changing dimensions.

如歌彻婉言 2024-10-01 13:23:45

您是否确实需要多个版本取决于需求。您是否只需要按原样报告贷款金额,还是还需要了解原样的情况?如果您不确定,那么保留历史记录(多个版本)可能是有意义的。我的假设是保留历史。

我建议为每个新版本创建新行和日期列来表示版本应用的日期。

Whether you actually need multiple versions depends on requirements. Do you only need to report on the loan amounts as-is or do you need to know the as-was position as well? If you aren't sure then it probably makes sense to keep the history (multiple versions). My assumption would be to keep the history.

I would suggest creating new rows for each new version and date column(s) to represent the date(s) the version applies.

居里长安 2024-10-01 13:23:45

正如你所说,贷款是事实。贷款可以调整,即贷款金额可以增加或减少。有两种一般方法可以处理此问题:

  1. 插入一个新行,其中包含事实的增量。其中现在有两行贷款。这意味着贷款事实的主键不能仅是贷款 ID。它(逻辑上,不一定是物理上)是加载 ID 和日期,或贷款 ID 和条目编号(日期作为另一个属性)。结果就如你所说。我会将 TransactionTypeKey 更改为 TransactionTypeCode,这是更正确的命名。

  2. 用新的余额更新事实。在这种情况下,最终结果会被存储,但更改会丢失。主键是LoanID;日期是一个属性。

大多数其他应用程序(例如订单处理)将使用第一个解决方案,我认为这是您所需要的。解决方案 1 的优点还在于它是完全相加的。

至于Saad El Oualji的评论,这根本不是SCD的案例,而是一个详细的事实设计。他对SCD的描述是正确的,但SCD描述的是维度变化,而不是事实变化。

As you say, Loan is a fact. Loans can be adjusted, meaning that the loan amount can be increased or decreased. There are two general ways to deal with this:

  1. Insert a new row with the delta to the fact. In this there are now two rows for the loan. This means that the primary key for the Loan fact cannot be Loan ID alone. It has (logically, not necessarily physically) to be Load ID and Date, or Loan ID and Entry Number (with date as another attribute). The result would be as you stated. I would change TransactionTypeKey to TransactionTypeCode, which is more correct naming.

  2. Update the fact with the new balance. In this case the final result is stored but the changes are lost. The primary key is LoanID; date is an attribute.

Most other applications, such as order processing, will use the first solution, which is what I think you need. Solution 1 has the advantage also that it is fully additive.

As to the comment from Saad El Oualji, this is not a case of SCD at all but of a detailed fact design. His description of SCD is correct but SCD describes dimension changes, not fact changes.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文