检查更新的维度数据

发布于 2024-09-14 05:56:06 字数 410 浏览 0 评论 0原文

我有一个 OLTP 数据库,目前正在创建一个数据仓库。 DW(DimStudents)中有一个维度表,其中包含学生数据,例如地址详细信息、电子邮件、通知设置。

在 OLTP 数据库中,这些数据分布在多个表中(因为它是第三范式的标准 OLTP 数据库)。

目前有 10,390 条记录,但这一数字预计还会增长。

我想使用类型 2 ETL,如果 OLTP 数据库中的记录发生更改,则会将新记录添加到 DW 中。

扫描 DW 中的 10,000 条记录,然后将结果与 OLTP 中包含的多个表中的结果进行比较的最佳方法是什么?

我正在考虑使用 OLTP 数据的临时表创建“快照”,然后将结果逐行与 DW 中维度表中的数据进行比较。

我正在使用 SQL Server 2005。这似乎不是最有效的方法。还有其他选择吗?

I have an OLTP database, and am currently creating a data warehouse. There is a dimension table in the DW (DimStudents) that contains student data such as address details, email, notification settings.

In the OLTP database, this data is spread across several tables (as it is a standard OLTP database in 3rd normal form).

There are currently 10,390 records but this figure is expected to grow.

I want to use Type 2 ETL whereby if a record has changed in the OLTP database, a new record is added to the DW.

What is the best way to scan through 10,000 records in the DW and then compare the results with the results in several tables contained in the OLTP?

I'm thinking of creating a "snapshot" using a temporary table of the OLTP data and then comparing the results row by row with the data in the Dimension table in the DW.

I'm using SQL Server 2005. This doesn't seem like the most efficient way. Are there alternatives?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

北方。的韩爷 2024-09-21 05:56:06
  • 将 LastUpdated 引入源系统 (OLTP) 表中。这样您就可以使用以下方法提取更少的内容:

    WHERE LastUpdated >= some_time_here

您似乎正在使用 SQL Server,因此您也可以尝试 rowversion 类型(8 字节 db-scope-unique 计数器)

  • 将数据导入 DW 时,使用 ETL工具(SSIS、Pentaho、Talend)。它们都有一个组件网络(块、转换)来处理 SCD2(缓慢变化的维度类型 2)。对于 SSIS 示例请参阅此处。该转换正是您想要做的事情——您所要做的就是指定要监视哪些列以及检测到更改时要执行的操作。
  • Introduce LastUpdated into source system (OLTP) tables. This way you have less to extract using:

    WHERE LastUpdated >= some_time_here

You seem to be using SQL server, so you may also try rowversion type (8 byte db-scope-unique counter)

  • When importing your data into the DW, use ETL tool (SSIS, Pentaho, Talend). They all have a componenet (block, transformation) to handle SCD2 (slowly changing dimension type 2). For SSIS example see here. The transformation does exactly what you are trying to do -- all that you have to do is specify which columns to monitor and what to do when it detects the change.
混浊又暗下来 2024-09-21 05:56:06

听起来你正在接近这种倒退。执行 ETL(提取、测试、加载)的典型方法是:

  1. 从 OLTP 数据库中“提取”数据
  2. 将提取的数据与维度数据进行比较(“测试”),以确定是否存在更改或需要执行任何其他验证
  3. 将数据(“加载”)插入维度表中。

实际上,在第 1 步中,您将通过查询 OLTP 数据库中的多个表来创建物理记录,然后将该结果记录与维度数据进行比较,以确定是否进行了修改。这是做事的标准方式。此外,就数量而言,10000 行是相当微不足道的。任何 RDBMS 和 ETL 进程都应该能够在最多不超过几秒钟的时间内完成处理。我知道 SQL Server 有 DTS,但我不确定该名称在最新版本中是否已更改。这是执行此类操作的完美工具。

It sounds like you are approaching this sort of backwards. The typical way for performing ETL (Extract, Test, Load) is:

  1. "Extract" data from your OLTP database
  2. Compare ("Test") your extracted data against the dimensional data to determine if there are changes or whatever other validation needs to be performed
  3. Insert the data ("Load") in to your dimension table.

Effectively, in step #1, you'll create a physical record via a query against the multiple tables in your OLTP database, then compare that resulting record against your dimensional data to determine if a modification was made. This is the standard way of doing things. In addition, 10000 rows is pretty insignificant as far as volume goes. Any RDBMS and ETL process should be able to process through that in a matter of no more than few seconds at most. I know SQL Server has DTS, although I'm not sure if the name has changed in more recent versions. That is the perfect tool for doing something like this.

昇り龍 2024-09-21 05:56:06

您的 OLTP 数据库有审计跟踪吗?

如果是这样,那么您可以在审计跟踪中查询自上次 ETL 以来已触及的记录。

Does you OLTP database have an audit trail?

If so, then you can query the audit trail for just the records that have been touched since the last ETL.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文