SSIS 中查找任务的替代方案
我正在为数据仓库开发一个SSIS解决方案,用于提取相应应用程序键的代理键,我正在使用SSIS的查找任务,但该任务的问题是它将完整的查找表缓存在其内存中。我的查找表大小很大,即 2000 万条记录。因此,如果您可以建议一些查找任务的方法或替代方案
i am working on a SSIS solution for datawarehouse for extracting Surrogate keys of corresponding application keys, I am using look up task of SSIS but the problem with this task is it caches the complete look up table in its memory . And my look up table size is huge i.e. 20 million records. So if u can suggest some ways or alternatives for look up task
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
我不认为包含 2000 万条记录的表太大而无法查找。您可以进行一些过滤,并通过在查找中仅选择所需的列,可以对其进行优化以使用少量内存。
例如,如果您有一个 int 类型的键列和一个大小为 10 的 varchar 列用于查找,则一条记录将占用 4+10 个字节,2000 万个字节将达到 20Mx(4+10) ~= 280MB,这不能被认为太高。
不过,如果您想减少内存使用,则必须使用联接。
I do not consider a table with 20 million records too huge for look up. You can do some filtering and by selecting only the required columns in the lookup you can optimize it for using small amount of memory.
For example if you have a key column of type int and a varchar column of size 10 needed for look up, a record will take 4+10bytes and 20million goes to 20Mx(4+10) ~= 280MB which cannot be considered as too high.
Still if you want to reduce memory usage, you will have to use joins.
当您将数据带入 SSIS 包时,对查找数据执行 LEFT JOIN,然后评估您需要的内容。
如果查找表位于不同的源中,那么您可以在 SSIS 中执行 LEFT JOIN,但这也会缓存行。我认为 JOIN 可能比 Lookup 稍微快一些。
Do a LEFT JOIN with your Lookup data when you bring the data into the SSIS package and then evaluate what you need to.
If the lookup table is in a different source, then you can do a LEFT JOIN in SSIS, but that is going to cache rows as well. I think that the JOIN may be marginally faster than a Lookup.
需要扫描整个表吗?即,您可以将查找指定为表上的数据库视图,或者甚至将其指定为 SQL 查询的结果(使用 SQL 查询选项的结果)
Do you have to scan the whole table? I.e. can you specify your lookup as a database View on the table, or even specify it as the results of a SQL Query (Use results of an SQL Query option)
确保在查找表中仅选择所需的列,不要缓存不需要的列。找一些时间看看MS“项目“Real” 在高数据量应用程序中使用 SSIS 并讨论最佳实践。
Make sure that you pick only columns that you need in the look-up table, do not cache columns which are not needed. Find some time to take a look at MS "Project "Real" which uses SSIS in high data-volume applications and discusses best practices.