如何使用 SSIS 包将大型数据集拆分为多个 Excel 电子表格?
我遇到了 SSIS 包的问题。
- 执行查询以从数据库(SQL Server 2008)(执行数据流任务)
- 使用 Excel 目标将提取的数据导出到 Excel 97-2003 电子表格 (.xls)
正如大多数人都知道的那样,xls 文件的每个工作表限制为 65,536 行 x 256 列。因此,当查询提取的记录数量超过记录限制 (65,536) 时,Excel 目标步骤将失败。
我收到以下错误消息。
错误:Calidad VIDA 处的 0xC0202009,Excel 目标 [82]:SSIS 错误代码 DTS_E_OLEDBERROR。发生 OLE DB 错误。错误代码:0x80004005。
错误:Calidad VIDA 处的 0xC0209029,Excel 目标 [82]:SSIS 错误代码 DTS_E_INDUCEDTRANSFORMFAILUREONERROR。 “输入“Excel 目标输入”(93)”失败,因为出现错误代码 0xC020907B,并且“输入“Excel 目标输入”(93)”上的错误行处理指定错误失败。指定组件的指定对象发生错误。在此之前可能会发布错误消息,其中包含有关失败的更多信息。错误:0xC0047022,位于 Calidad VIDA、SSIS.Pipeline:SSIS 错误代码 DTS_E_PROCESSINPUTFAILED。处理输入“Excel 目标输入”(93) 时,组件“Excel 目标”(82) 上的 ProcessInput 方法失败,错误代码为 0xC0209029。所识别的组件从 ProcessInput 方法返回错误。该错误特定于该组件,但该错误是致命的,将导致数据流任务停止运行。在此之前可能会发布错误消息,其中包含有关失败的更多信息。
错误:Calidad VIDA 处的 0xC02020C4,OLE DB 源 [1]:尝试向数据流任务缓冲区添加行失败,并显示错误代码 0xC0047020。
错误:Calidad VIDA、SSIS.Pipeline 处的 0xC0047038:SSIS 错误代码DTS_E_PRIMEOUTPUTFAILED。组件“OLE DB Source”(1) 上的 PrimeOutput 方法返回错误代码 0xC02020C4。当管道引擎调用 PrimeOutput() 时,组件返回失败代码。失败代码的含义由组件定义,但错误是致命的,管道停止执行。在此之前可能会发布错误消息,其中包含有关失败的更多信息。
该文件需要采用该格式,因为客户端没有更新的版本。他们不想购买许可证。有谁知道如何解决这个问题? 我应该使用脚本任务并自己制作excel,或者我应该为每个循环创建一个foreach循环并创建各种excel工作簿?
I'm facing a problem with an SSIS package.
- A query is executed to obtain some data from the DataBase (SQL Server
2008) (Data Flow Task executed) - Export the data extracted to an Excel 97-2003 spreadsheet (.xls) using Excel Destination
As most of you know the xls files are limited per sheet to 65,536 rows by 256 columns. So when the query extracts more than the records limit (65,536), the Excel Destination Step fails.
I get the following error messages.
Error: 0xC0202009 at Calidad VIDA, Excel Destination [82]: SSIS Error Code DTS_E_OLEDBERROR. An OLE DB error has occurred. Error code: 0x80004005.
Error: 0xC0209029 at Calidad VIDA, Excel Destination [82]: SSIS Error Code DTS_E_INDUCEDTRANSFORMFAILUREONERROR. The "input "Excel Destination Input" (93)" failed because error code 0xC020907B occurred, and the error row disposition on "input "Excel Destination Input" (93)" specifies failure on error. An error occurred on the specified object of the specified component. There may be error messages posted before this with more information about the failure. Error: 0xC0047022 at Calidad VIDA, SSIS.Pipeline: SSIS Error Code DTS_E_PROCESSINPUTFAILED. The ProcessInput method on component "Excel Destination" (82) failed with error code 0xC0209029 while processing input "Excel Destination Input" (93). The identified component returned an error from the ProcessInput method. The error is specific to the component, but the error is fatal and will cause the Data Flow task to stop running. There may be error messages posted before this with more information about the failure.
Error: 0xC02020C4 at Calidad VIDA, OLE DB Source [1]: The attempt to add a row to the Data Flow task buffer failed with error code 0xC0047020.
Error: 0xC0047038 at Calidad VIDA, SSIS.Pipeline: SSIS Error Code DTS_E_PRIMEOUTPUTFAILED. The PrimeOutput method on component "OLE DB Source" (1) returned error code 0xC02020C4. The component returned a failure code when the pipeline engine called PrimeOutput(). The meaning of the failure code is defined by the component, but the error is fatal and the pipeline stopped executing. There may be error messages posted before this with more information about the failure.
The file needs to be in that format, because the clients don't have newer versions. And they don't want to buy licenses. Does anyone know how to work around with this issue?
I should use a Script task and make the excel by my own, or I should make a for each loop and create various excels woorkbooks?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
这是一个可能的选项,您可以使用该选项根据您想要在每个 Excel 工作表中写入的记录数量,使用 SSIS 动态创建 Excel 工作表。这不涉及脚本任务。以下示例描述了如何使用执行 SQL 任务、For 循环容器和数据流任务来实现这一点。该示例是使用
SSIS 2008 R2
创建的。分步过程:
在 SQL Server 数据库中,运行 SQL 脚本 部分下提供的脚本。这些脚本将创建一个名为
dbo.SQLData
的表,然后使用从1 x 1
到20 x 40
的乘法数据填充该表,从而创建 800 条记录。该脚本还创建一个名为dbo.FetchData
的存储过程,该存储过程将在 SSIS 包中使用。在 SSIS 包上,创建 9 变量,如屏幕截图 #1 所示。以下步骤描述了如何配置每个变量。
将变量 ExcelSheetMaxRows 设置为值 80。此变量表示每个 Excel 工作表要写入的行数。您可以将其设置为您选择的值。在您的情况下,这将是 65,535(您可能希望为标题列名称保留 1 行)。
将变量 SQLFetchTotalRows 设置为值
SELECT COUNT(Id) AS TotalRows FROM dbo.SQLData
。此变量包含从表中获取总行数的查询。选择变量 StartIndex,然后按 F4 选择“属性”。将属性 EvaluateAsExpression 设置为
True
,并将属性 Expression 设置为值(@[User::Loop] * @[User: :ExcelSheetMaxRows]) + 1
。请参阅屏幕截图 #2。选择变量 EndIndex,然后按 F4 选择“属性”。将属性 EvaluateAsExpression 设置为
True
,并将属性 Expression 设置为值(@[User::Loop] + 1) * @ [用户::ExcelSheetMaxRows]
。请参阅屏幕截图 #3。选择变量 ExcelSheetName,然后按 F4 选择“属性”。将属性 EvaluateAsExpression 设置为
True
,并将属性 Expression 设置为值"Sheet" + (DT_WSTR,12) (@[User ::循环] + 1)
。请参阅屏幕截图 #4。选择变量SQLFetchData,然后按F4选择“属性”。将属性 EvaluateAsExpression 设置为
True
,并将属性 Expression 设置为值"EXEC dbo.FetchData " + (DT_WSTR, 15) @ [用户::开始索引] + "," + (DT_WSTR, 15) @[用户::结束索引]
。请参阅屏幕截图 #5。选择变量 ExcelTable,然后按 F4 选择“属性”。将属性 EvaluateAsExpression 设置为
True
,将属性 Expression 设置为 ExcelTable 变量值 部分下提供的值。请参阅屏幕截图 #6。在 SSIS 包的“控制流”选项卡上,放置一个“执行 SQL 任务”并对其进行配置,如屏幕截图 #7 和 #8 所示。此任务将获取记录计数。
在 SSIS 包的“控制流”选项卡上,放置一个 For 循环容器并对其进行配置,如屏幕截图 #9 所示。请注意,这是For 循环,而不是Foreach 循环。此循环将根据每个 Excel 工作表中显示的记录数以及表中找到的记录总数来执行。
创建包含 .xls 扩展名的 Excel 97-2003 格式的 Excel 电子表格,如屏幕截图 #10 所示。我在 **C:\temp**
创建
在 SSIS 包的连接管理器上,创建一个名为
SQLServer
的 OLE DB 连接(指向 SQL Server)和一个名为Excel 的 Excel 连接
指向新创建的 Excel 文件。单击 Excel 连接并选择“属性”。将属性 DelayValidation 从 False 更改为 True,以便当我们切换到使用变量在数据流任务中创建工作表时,我们不会收到任何错误消息。请参阅屏幕截图 #11。
在 For 循环容器内,放置一个执行 SQL 任务并对其进行配置,如屏幕截图 #12 所示。此任务将根据要求创建 Excel 工作表。
在 For 循环容器内,放置一个数据流任务。配置任务后,“控制流”选项卡应如屏幕截图 #13 所示。
在数据流任务中,放置一个 OLE DB 源以使用存储过程从 SQL Server 读取数据。配置 OLE DB 源,如屏幕截图 #14 和 #15 所示。
在数据流任务中,放置一个 Excel 目标以将数据插入到 Excel 工作表中。配置 Excel 目标,如屏幕截图 #16 和 #17 所示。
配置数据流任务后,它应如屏幕截图 #18 所示。
删除在步骤12中创建的 Excel 文件,因为包在执行时会自动创建该文件。如果不删除,包会抛出Sheet1已存在的异常。此示例使用路径 C:\temp\ ,屏幕截图 #19 显示该路径中没有文件。
屏幕截图 #20 和 #21 显示控制流和数据流任务内的包执行。
屏幕截图 #22 显示文件
ExcelData.xls
已在路径 C:\temp 中创建。请记住,早些时候这条路是空的。由于表中有 800 行,因此我们设置包变量 ExcelSheetMaxRows 来为每个工作表创建 80 行。因此,Excel 文件有 10 个工作表。请参阅屏幕截图 #23。注意:
在此示例中我没有做的一件事是检查文件 ExcelData.xls 是否已存在于路径 C:\temp 中。如果存在,则应在执行任务之前删除该文件。这可以通过创建一个保存 Excel 文件路径的变量并使用文件系统任务在执行第一个执行 SQL 任务之前删除该文件来实现。希望有帮助。
ExcelTable 变量值:
SQL 脚本:
屏幕截图 #1:
屏幕截图 #2:
屏幕截图 #3:
屏幕截图 #4:
屏幕截图 #5:
屏幕截图 # 6:
屏幕截图 #7:
屏幕截图 #8:
屏幕截图#9:
屏幕截图 #10:
屏幕截图 #11:
截屏#12:
屏幕截图 #13:
屏幕截图 #14:
屏幕截图 #15:
屏幕截图 #16:
屏幕截图 #17:
屏幕截图 #18:
屏幕截图 #19:< /strong>
屏幕截图 #20:
屏幕截图 #21:
屏幕截图 #22:
屏幕截图#23:
Here is one possible option that you can use to create Excel worksheets dynamically using the SSIS based on how many number of records you want to write per Excel sheet. This doesn't involve Script tasks. Following example describes how this can be achieved using Execute SQL Tasks, For Loop container and Data Flow Task. The example was created using
SSIS 2008 R2
.Step-by-step process:
In SQL Server database, run the scripts provided under SQL Scripts section. These scripts will create a table named
dbo.SQLData
and then will populate the table with multiplication data from1 x 1
through20 x 40
, thereby creating 800 records. The script also creates a stored procedure nameddbo.FetchData
which will be used in the SSIS package.On the SSIS package, create 9 variables as shown in screenshot #1. Following steps describe how each of these variables are configured.
Set the variable ExcelSheetMaxRows with value 80. This variable represents the number of rows to write per Excel sheet. You can set it to value of your choice. In your case, this would be 65,535 (you might want to leave 1 row for header column names).
Set the variable SQLFetchTotalRows with value
SELECT COUNT(Id) AS TotalRows FROM dbo.SQLData
. This variable contains the query to fetch the total row count from the table.Select the variable StartIndex and choose Properties by pressing F4. Set the property EvaluateAsExpression to
True
and the property Expression to the value(@[User::Loop] * @[User::ExcelSheetMaxRows]) + 1
. Refer screenshot #2.Select the variable EndIndex and choose Properties by pressing F4. Set the property EvaluateAsExpression to
True
and the property Expression to the value(@[User::Loop] + 1) * @[User::ExcelSheetMaxRows]
. Refer screenshot #3.Select the variable ExcelSheetName and choose Properties by pressing F4. Set the property EvaluateAsExpression to
True
and the property Expression to the value"Sheet" + (DT_WSTR,12) (@[User::Loop] + 1)
. Refer screenshot #4.Select the variable SQLFetchData and choose Properties by pressing F4. Set the property EvaluateAsExpression to
True
and the property Expression to the value"EXEC dbo.FetchData " + (DT_WSTR, 15) @[User::StartIndex] + "," + (DT_WSTR, 15) @[User::EndIndex]
. Refer screenshot #5.Select the variable ExcelTable and choose Properties by pressing F4. Set the property EvaluateAsExpression to
True
and the property Expression to the value provided under ExcelTable Variable Value section. Refer screenshot #6.On the SSIS package's Control Flow tab, place an Execute SQL Task and configure it as shown in screenshots #7 and #8. This task will fetch the record count.
On the SSIS package's Control Flow tab, place a For Loop Container and configure it as shown in screenshot #9. Please note this is For Loop and not Foreach Loop. This loop will execute based on the number of records to display in each Excel sheet in conjunction with the total number of records found in the table.
Create an Excel spreadsheet of Excel 97-2003 format containing .xls extension as shown in screenshot #10. I created the file in **C:\temp**
On the SSIS package's connection manager, create an OLE DB connection named
SQLServer
pointing to SQL Server and an Excel connection namedExcel
pointing to the newly created Excel file.Click on the Excel connection and select Properties. Changes the property DelayValidation from False to True so that when we switch to using variable for sheet creation in Data Flow Task, we won't get any error messages. Refer screenshot #11.
Inside the For Loop container, place an Execute SQL Task and configure it as shown in screenshot #12. This task will create Excel worksheets based on the requirements.
Inside the For Loop container, place a Data flow task. Once the tasks are configured, the Control Flow tab should look like as shown in screenshot #13.
Inside the Data Flow Task, place an OLE DB Source to read data from SQL Server using the stored procedure. Configure the OLE DB Source as shown in screenshots #14 and #15.
Inside the Data Flow Task, place an Excel Destination to insert the data into the Excel sheets. Configure the Excel destination as shown in screenshots #16 and #17.
Once the Data Flow Task is configured, it should look like as shown in screenshot #18.
Delete the Excel file that was created in step 12 because the package will automatically create the file when executed. If not deleted, the package will throw the exception that Sheet1 already exists. This example uses the path C:\temp\ and screenshot #19 shows there are no files in that path.
Screenshots #20 and #21 show the package execution inside Control Flow and Data Flow tasks.
Screenshot #22 shows that file
ExcelData.xls
has been created in the path C:\temp. Remember, earlier this path was empty. Since we had 800 rows in the table and we set the package variable ExcelSheetMaxRows to create 80 rows per sheet. Hence, the Excel file has 10 sheets. Refer screenshot #23.NOTE:
One thing that I haven't done in this example is to check if the file ExcelData.xls already exists in the path C:\temp. If it exists, then the file should be deleted before executing the tasks. This can be achieved by creating a variable that holds the Excel file path and use a File System Task to delete the file before the first Execute SQL Task is executed.Hope that helps.
ExcelTable Variable Value:
SQL Scripts:
Screenshot #1:
Screenshot #2:
Screenshot #3:
Screenshot #4:
Screenshot #5:
Screenshot #6:
Screenshot #7:
Screenshot #8:
Screenshot #9:
Screenshot #10:
Screenshot #11:
Screenshot #12:
Screenshot #13:
Screenshot #14:
Screenshot #15:
Screenshot #16:
Screenshot #17:
Screenshot #18:
Screenshot #19:
Screenshot #20:
Screenshot #21:
Screenshot #22:
Screenshot #23:
Stock Excel 渲染选项允许选项分页到单独的选项卡上。如果您在适当的行数之后强制分页,则输出中的每个“页面”都会有一个新选项卡。我没有过去使用过的设置,但如果您需要的话,我明天可以查一下。
The stock excel rendering option allows for options to paginate onto separate tabs. If you force page breaks after the appropriate number of rows, you get a new tab for each "page" in your output. I don't have the settings I have used in the past for this, but I can look it up tomorrow if you need me to.
对我来说也有用的是使用 SSIS 包导出到 CSV 文件,然后手动将数据导入到 Excel 中。请注意,这与 Excel 中的“打开”不同,因为这也只会停止在 65536 行处。新建一个xlsx文件,点击“数据”-> “来自文本”。它将导入并显示所有行。使用 750,000 行进行测试。
但是,不确定 csv 是否 -> xlsx 转换可以在 SSIS 包中轻松编写脚本。最有可能通过使用 Excel COM 对象的脚本任务。
Also worked for me was to use the SSIS package to export to a CSV file and then manually import the data into Excel. Note that this is not the same as "Open" in Excel, because that will also just stop at 65536 rows. Create a new xlsx file and click on "Data" -> "From Text". It will import and show all rows. Tested with 750,000 rows.
However, not sure if the csv -> xlsx conversion is easily scripted within an SSIS package. Most likely via a Script task using the Excel COM object.