处理数据仓库负载中的主键重复项
我目前正在构建一个 ETL 系统来从事务系统加载数据仓库。我的事实表的粒度是事务级别。为了确保不加载重复行,我在事实表上放置了一个主键,即事务 ID。
我遇到了事务被逆转的问题 - 在事务数据库中,这是通过状态完成的,我拾取状态,我可以计算出事务是否正在完成,或者回滚,以便我可以在仓库。但是,反转行将具有相同的事务 ID,因此我遇到了主键冲突。
我现在已经通过否定主键解决了这个问题,因此交易 ID 1 将是付款,交易 ID -1(仅在仓库中)将是冲销。
我考虑过生成BIT列的替代方案,其中0是正常的,1是反转的,然后将PK作为交易ID和BIT列。
我的问题是,这是一个好的做法吗?还有其他人遇到过这样的事情吗?作为参考,这是一个支付处理系统,因此值不会被修改,因此只会有交易和逆转。
I'm currently building an ETL system to load a data warehouse from a transactional system. The grain of my fact table is the transaction level. In order to ensure I don't load duplicate rows I've put a primary key on the fact table, which is the transaction ID.
I've encountered a problem with transactions being reversed - In the transactional database this is done via a status, which I pick up and I can work out if the transaction is being done, or rolled back so I can load a reversal row in the warehouse. However, the reversal row will have the same transaction ID and so I get a primary key violation.
I've solved this for now by negating the primary key, so transaction ID 1 would be a payment, and transaction ID -1 (In the warehouse only) would be the reversal.
I have considered an alternative of generating a BIT column, where 0 is normal and 1 is reversal, then making the PK the transaction ID and the BIT column.
My question is, is this a good practice, and has anyone else encountered anything like this? For reference, this is a payment processing system, so values will not be modified, so there will only ever be transactions and reversals.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
在大多数情况下,事实表有一个主键,它是多个 FK 的组合。因此,也许您可以使用 TransactionID 和 FK 的组合来
dimTransactionType
作为主键。dimTransactionType
看起来像这样:并且
不建议在 DW 中对位和标志进行修补 - 尽可能详细。
In most cases, a fact table has a primary key which is a composite of several FKs. So, maybe you could use a combination of the TransactionID and a FK to
dimTransactionType
as the primary key.The
dimTransactionType
would look something like:and would have
Tinkering with bits and flags is not recommended in a DW -- verbose as much as possible.
设计事实表的常见方法是使用代理键作为主键。大整数值通常就足够了。如果事务 ID 是维度记录的外键,则不应将其用作事实表中的主键。您的代理键逻辑(即事实表中新记录的触发器)可以基于事务 ID 和事务类型的组合。
A common approach to designing fact tables is to use surrogate keys as the primary keys. A large integer value is usually sufficient. If transaction id is a foreign key to a dimension record, then it should not be used as a primary key in the fact table. Your surrogate key logic (i.e. trigger for a new record in the fact table) can be based on a combination of the transaction id and the transaction type.