SSIS 整合大量数据 - 建议
我需要整合 30 个数据库和 20 个表的数据。我的第一次尝试是为每个表创建一个数据流。每个数据流将有 30 个数据库源,从每个数据库的 1 个表中获取数据,然后插入到 1 个目标中。这不起作用,因为没有足够的内存。我尝试过滤数据以忽略不会使用的数据 - 但这没有帮助。这些任务应该每天执行。整个Consolidation db被删除,然后插入新数据,这意味着每天都会插入大量数据。有人对如何处理内存问题有建议吗?
目前我看到的唯一解决方案是创建 20 个包。每个包有 30 个数据流。每个数据流仅从一个数据库获取数据。但话又说回来,我不知道执行顺序是什么?一个包的数据流是同时开始执行还是一个接着一个执行?
编辑:
这就是我所做的......这是控制流的屏幕截图:
数据流截图:
我为所有表中的一张表创建了一个包数据库。在这种情况下,每个数据流从表中获取 4 年的数据并导入到源数据库。问题是我在一个数据库中有一个表,该表包含太多数据并且返回错误:
缓冲区管理器对 10484608 字节的内存分配调用失败, 但无法交换任何缓冲区来缓解内存压力。 20 考虑了缓冲区并锁定了 20 个缓冲区。要么内存不够 可供管道使用,因为安装的数量不够,其他 进程正在使用它,或者太多缓冲区被锁定。
有什么建议吗?
I need to consolidate data from 30 databases and 20 tables. My first attempt was to create one data flow for each table. Each data flow would have 30 db sources which takes data from 1 table from each db and then inserts in 1 destination. This doesn't work because there is no enough memory. I tried to filter data to ignore data that's not going to be used - it didn't help. These tasks should perform on daily basis. Whole Consolidation db is deleted and then new data is inserted, which means that huge amounts of data are inserted daily. Anyone has a suggestion of how to handle memory problem?
The only solution for now I see is to create 20 packages. Each package would have 30 data flows. Each data flow would take data from only one database. But then again, I don't know what is the order of execution? Will data flows of one package start to execute at the same time or they go one by one?
EDIT:
This is how I did it... here's screenshot of Control flow:
Screenshot of Data Flow:
I created one package for one table from all databases. In this case, each dataflow takes 4-year old data from a table and imports to a source database. The problem is that I have one table in one database which has too much data and it returns error:
The buffer manager failed a memory allocation call for 10484608 bytes,
but was unable to swap out any buffers to relieve memory pressure. 20
buffers were considered and 20 were locked. Either not enough memory
is available to the pipeline because not enough are installed, other
processes were using it, or too many buffers are locked.
Any suggestions?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
如果您计划在 SSIS 中执行此操作,我会建议您执行以下操作。
假设您有表 A、B、C ...、R、数据库 DB1、DB2 ... DB30 中的 >S、T。还有 MainDB 中的整合数据库。
创建一个 SSIS 包,其中包含 1 个或多个数据流任务,用于将数据从数据库 DB1 中的表传输到数据库 MainDB 中的表。为此,您可以使用 OLEDB Source 从 DB1 检索数据。如果您必须应用一些转换,您可以通过适当的转换任务来完成。要将数据插入目标表,您可以使用 OLE DB 目标。
将连接字符串保存在程序包配置文件 (dtsConfig) 中
创建 30 个不同的配置文件,每个文件都包含您拥有的 30 个数据库的连接字符串。
您可以在 SQL Server 代理下创建 30 个 SQL 作业来安排每日运行。
每个 SQL 作业都将运行相同的 SSIS 包,但使用相应的配置文件来指向正确的数据库。
其他选项为
:使用SQL Server 复制。您需要一个两步过程。使用 SQL 作业截断 MainDB 中的数据,然后在数据库之间同步数据。
使用商业产品,例如 Redgate 或 Idera
希望能给您一个想法。
Here is what I would suggest if you are planning to do this in SSIS.
Let's assume that you have tables A, B, C ..., R, S, T in databases DB1, DB2 ... DB30. And also your consolidation database in MainDB.
Create an SSIS package that will contain 1 or more data flow tasks to transfer data from tables in database DB1 to tables in database MainDB. To do this, you can use a OLEDB Source to retrieve data from DB1. If you have to apply some transformations, you can do that with appropriate transformation tasks. To insert data into destination tables, you can use an OLE DB destination.
Save the connection string in package configuration file (dtsConfig)
Create 30 different configuration files each containing the connection string for the 30 databases that you have.
You can create 30 SQL jobs under SQL Server Agent to schedule the daily run.
Each of the SQL jobs will run the same SSIS package but make use of the corresponding configuration file to point to the correct database.
Other options are
:Using SQL Server Replication. You need a two step process. Truncate the data in MainDB using an SQL job and then sync the data between the databases.
Use commercial products like Redgate or Idera
Hope that gives you an idea.