使用查找组件进行 SSIS 查找与脚本组件
我需要从 EDW 表(它确实维护历史记录)加载维度,并且类型为键值参数。
如果在 EDW 中获得如下记录,我的情况就可以了
Key1 Key2 Code Value EffectiveDate EndDate CurrentFlag
100 555 01 AAA 2010-01-01 11.00.00 9999-12-31 Y
100 555 02 BBB 2010-01-01 11.00.00 9999-12-31 Y
这需要通过旋转它来加载到 DM,因为
key1 和 key2 组合使 DM 的自然键
SK NK 01 02 EffectiveDate EndDate CurrentFlag
1 100-555 AAA BBB 2010-01-01 11.00.00 9999-12-31 Y
我的 ssis 包完成了这一切良好的旋转...在 DIM 中查找传入的 NK .. 如果新的将插入 .. else 进一步查找有效日期,并确定相同自然键的传入属性是否有任何新的(更改)。如果是,则通过设置其结束日期来更新当前记录并插入新记录使用新的属性值并提取其他属性的最近记录值。
我的问题是,如果相同的自然键在单个提取中以相同的属性出现两次,我的第一次查找在自然键上..将让两条记录通过并尝试插入..失败的地方。如果我在 NK 上获得不同的记录,则不会选择第二个记录,需要再次运行包。
所以我的问题是,当同一个 NK 在单个提取中出现两次时,我如何配置查找或替代方法来处理这种情况,如果 Dim 表中不存在,则能够插入第一条记录,而第二条记录应该能够通过更改进行更新参考上面插入的一个。
不确定我试图解释的内容是否有意义。将在返回办公桌后附上屏幕截图(周一)。
谢谢
I need to load Dimensions from EDW Tables (which does maintain historical records) and is of type Key-Value-Parameter.
My scenario is ok if got a record in EDW as below
Key1 Key2 Code Value EffectiveDate EndDate CurrentFlag
100 555 01 AAA 2010-01-01 11.00.00 9999-12-31 Y
100 555 02 BBB 2010-01-01 11.00.00 9999-12-31 Y
This need to be loaded into DM by pivoting it as
key1 and key2 combinations makes Natural key for DM
SK NK 01 02 EffectiveDate EndDate CurrentFlag
1 100-555 AAA BBB 2010-01-01 11.00.00 9999-12-31 Y
My ssis package does this all good pivoting... looking up the incoming NK in DIM.. if new will insert .. else with further lookup with effective date and determine if the incoming for same natural key got any new (change) in attribute.. if so updates the current record byy setting its end date and insert the new one with new attribute value and pulling the recent records values for other attributes.
My problem is if the same natural key comes twice with same attribute in single extract my first lookup which on natural key .. will let both records pass and try to insert.. where its fails. If i get distinct records on NK the second is not picked and need to run package again.
So my question how can i configure lookup or alernative way to handle this scenario when same NK comes twice in single extract, would be able to insert first record if not exists in Dim table and for second one should be able to updated with the changes with reference to one inserted above.
Not sure this makes sense what am trying to explain. Will attached the screenshot once back to work desk (on monday).
Thanks
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
查找对此不太好 - 通过缓存和所有内容,它只是无法查找之前设置的值。
您最好将其传递给 SQL 命令任务并让存储过程根据它找到的内容执行插入或过期/插入。
您还可以将它们流式传输到表中并批量执行。
为了解决您的流程及其尝试填充的模型:
首先,当输入中的行顺序导致行为差异时总是很尴尬 - 即 NK = A,Val = 1 然后 NK = A,Val = 2 给出不同的行为比 NK = A, Val = 2 那么 NK = A, Val = 1。人们不得不怀疑这是否是正确的尺寸设计。请记住,所有维度属性都是根据实用选择分配给维度表的。最终尺寸可以随意排列到表格中 - 因此不同的设计可能更有意义。如果单次加载中的情况发生变化,则可能表明您需要分解该加载以匹配粒度(而不是尝试一次加载 2 天的数据)。
我注意到您的维度中有一个生效日期和一个结束日期。现在,这听起来很像维度行为的属性(其中 01 和 02 代码在 NK 上发生变化),而不是该维度将附加到的事实。这可能表明它需要在单独的非事实事实表中进行跟踪,例如(SK、EffectiveDate、EndDate) - 或者它并不重要,因为您关心的只是附加到事实的 NK、01、02 组合,在这种情况下,您的自然键实际上是 NK、01、02 的全部。
我建议返回到您的事实表和传入的提要以及预期的用法,并更仔细地查看它,看看这是否可能需要成为一个单独的非事实表跟踪这些维度变化的事实表。
另外,如果您可以发布更多详细信息,这可能会有所帮助,当我看到更多商业案例时,我会看看金博尔的材料对此有何评论。
Lookup is not good for this - with caching and everything, it's just not able to lookup on it's previously set values.
You might be better off passing it to a SQL Command task and having a stored proc do an insert or expire/insert depending on what it finds.
You could also stream them to a table and do it in a batch.
To address your flow and the model it's trying to populate:
To start with, it's always awkward when the order of rows in the input causes behavioral differences - i.e. NK = A, Val = 1 then NK = A, Val = 2 gives different behavior than NK = A, Val = 2 then NK = A, Val = 1. One has to wonder if this is the correct dimensional design. Remember that all dimensional attributes are assigned to dimensional tables based on a pragmatic choice. Ultimately dimensions can be arranged into tables at will - so a different design might make more sense. If things are changing within a single load, that may indicate you need to break up that load to match the grain (not attempting to load 2 days data at one time).
I notice that there is an Effective Date and an End Date in your dimension. Right now this sounds a lot like a property of the dimension behavior (where your 01 and 02 codes are changing on a NK) and not of the facts that this dimension is going to be attached to. This might indicate that it needs to be tracked in a separate factless fact table, say (SK, EffectiveDate, EndDate) - or that it's just not important, because all you care about is a NK, 01, 02 combination attached to a fact, in which case, your natural key is really all of NK, 01, 02.
I recommend going back to your fact table and the incoming feed and the expected usage and looking at it more closely and see if this maybe need to be a separate factless fact table tracking these dimension changes.
Also, if you could post more details, that might help, and I'll see what Kimball's materials have to say about it when I see more of a business case.
凯德的评论是正确的 - 但我相信你的主要问题是重复,句号。 “源”流中具有同一 NK 的两个版本这一事实是否表明存在两个独立的、有意义的版本?或者只有“最后”版本才重要?
如果两个版本中反映的更改都应反映在您的维度表中,那么我同意凯德的建议,将您的处理分成批次。您可以按 NK(和更改时间)对输入进行排序,然后使用行计数脚本枚举每个 NK 的“版本”,然后按版本号处理“批次”。
如果仅需要将最后一个“版本”合并到维度表中,我建议您在使用查找之前消除重复项。
Cade's comments are spot on - but I believe your main issue is the duplicates, period. Does the fact that you have two versions of the same NK in the "source" stream indicate two separate, meaningful versions? Or does only the "last" version matter?
If changes reflected in both versions should be reflected in your dimension table, then I echo Cade's suggestion to divide your processing into batches. You could sort your input by the NK (and time of change), then use a row counting script to enumerate the "versions" of each NK, then process "batches" by version number.
If only the last "version" need be incorporated into the dimension table, I suggest you eliminate duplicates prior to using a Lookup.