如何使用 SSIS 包循环遍历 Excel 文件并将其加载到数据库中?
我需要创建一个 SSIS 包,用于将多个 Excel 文件中的数据导入到 SQL 数据库中。我计划使用嵌套的 Foreach 循环容器来实现此目的。一个 Foreach 文件枚举器,并嵌套在其中,一个 Foreach ADO.net 架构行集枚举器
要考虑的问题:Excel 文件之间的工作表名称不同,但结构保持不变。
我已创建 Excel 连接管理器,但架构行集枚举器不接受枚举器配置中的连接管理器。
经过研究,我发现您可以使用 Jet Ole db 提供程序连接到 Excel 文件。但是,我只能指定 Microsoft Access 数据库文件作为数据源。尝试插入 Excel 文件作为数据源失败
经过更多研究,我发现您可以使用带有连接字符串而不是 DSN 的 Odbc 数据提供程序。插入指定 Excel 文件的连接字符串后,这也失败了,
我被告知不要使用脚本任务来完成此操作,即使在尝试通过索引访问工作表的最后努力从工作表中提取数据之后,我发现索引不同Excel文件中的工作表是不同的
任何帮助将不胜感激
I need to create an SSIS package for importing data from multiple Excel files into an SQL database. I plan on using nested Foreach Loop containers to achieve this. One Foreach File Enumerator and nested within that, a Foreach ADO.net Schema Rowset Enumerator
Problem to consider: Sheet names are different between excel files but structure remains the same.
I have created an Excel Connection Manager, but the Schema Rowset Enumerator is not accepting the connection manager in the Enumerator configuration.
After researching, I found that you can use the Jet Ole db provider to connect to an excel file. However, I can only specify Microsoft Access Database Files as the data source. Attempting to insert an Excel File as the data source fails
After more research I found that you can use the Odbc Data Provider with a connection string instead of a DSN. After inserting a connection string specifying the Excel file this also failed
I have been told not to use a Script Task to accomplish this and even after trying a last ditch effort to extract data from sheets be accessing the sheets by index I found that the index for the sheets in the different excel files are different
Any help would be greatly appreciated
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
这是一种可能的方法,假设 Excel 文件中不存在任何空白工作表,并且所有工作表都遵循完全相同的结构。另外,假设文件扩展名仅为
.xlsx
以下示例是使用 SSIS 2008 R2 和 Excel 2007 创建的。本示例的工作文件夹为
F:\Temp\
在文件夹路径
F:\Temp\
中,创建一个名为States_1.xlsx< 的 Excel 2007 电子表格文件/code> 有两个工作表。
States_1.xlsx 的
工作表 1
包含以下数据< States_1.xlsx 的 code>Sheet 2 包含以下数据
在文件夹路径
F:\Temp\
中,创建另一个名为States_2.xlsx
的 Excel 2007 电子表格文件,其中包含两个工作表。States_2.xlsx 的
工作表 1
包含以下数据< States_2.xlsx 的 code>Sheet 2 包含以下数据
使用以下创建脚本在 SQL Server 中创建一个名为 dbo.Destination 的表。 Excel 工作表数据将插入到该表中。
该表当前为空。
创建一个新的 SSIS 包,并在该包上创建以下 4 个变量。 FolderPath 将包含存储 Excel 文件的文件夹。 FilePattern 将包含将循环通过的文件的扩展名,此示例仅适用于
.xlsx
。 FilePath 将由 Foreach 循环容器分配一个值,但我们需要一个有效的路径来开始设计,并且当前使用路径F:\Temp\States_1.xlsx 填充该路径第一个 Excel 文件的
。 SheetName 将包含实际的工作表名称,但我们需要填充初始值Sheet1$
以避免设计时错误。在包的连接管理器中,使用以下配置创建 ADO.NET 连接并将其命名为 ExcelSchema< /强>。
在 .Net OleDb 提供程序下选择提供程序
Microsoft Office 12.0 Access Database Engine OLE DB Provider
。提供文件路径F:\Temp\States_1.xlsx
单击
All
部分左侧并将属性 Extended Properties 设置为Excel 12.0
以表示 Excel 的版本。在本例中,12.0 表示Excel 2007
。单击“测试连接”以确保连接成功。创建一个名为 Excel 的 Excel 连接管理器,如下所示。
创建名为
SQLServer
的 OLE DB 连接 SQL Server。因此,我们应该在封装上有三个连接,如下所示。我们需要进行以下连接字符串更改,以便 Excel 文件在文件循环时动态更改。
在连接 ExcelSchema 上,配置表达式
ServerName
以使用变量FilePath
。单击省略号按钮以配置表达式。同样,在连接 Excel 上,配置表达式
ServerName
使用变量FilePath
。单击省略号按钮以配置表达式。在控制流上,将两个 Foreach 循环容器依次放置。第一个名为 Loop 文件的
Foreach Loop 容器
将循环访问这些文件。第二个Foreach 循环容器
将遍历容器内的工作表。在内部 For 每个循环容器中,放置一个数据流任务,该任务将读取 Excel 文件并将数据加载到 SQL配置名为 Loop files 的第一个 Foreach 循环容器,如下所示:
配置名为 Loop Sheets 的第一个 Foreach 循环容器,如下所示:
在数据流任务内,放置 Excel 源、派生列和 OLE DB 目标,如下所示:
配置 Excel 源以读取相应的 Excel 文件和当前正在循环的工作表。
配置派生列以为文件名和工作表名称创建新列。这只是为了演示这个例子,没有任何意义。
配置 OLE DB 目标以将数据插入 SQL 表中。
下面的屏幕截图显示了包的成功执行。
下面的屏幕截图显示,在本答案开头创建的 2 个 Excel 电子表格中的 4 个工作簿中的数据是正确的加载到 SQL 表 dbo.Destination 中。
Here is one possible way of doing this based on the assumption that there will not be any blank sheets in the Excel files and also all the sheets follow the exact same structure. Also, under the assumption that the file extension is only
.xlsx
Following example was created using SSIS 2008 R2 and Excel 2007. The working folder for this example is
F:\Temp\
In the folder path
F:\Temp\
, create an Excel 2007 spreadsheet file namedStates_1.xlsx
with two worksheets.Sheet 1
of States_1.xlsx contained the following dataSheet 2
of States_1.xlsx contained the following dataIn the folder path
F:\Temp\
, create another Excel 2007 spreadsheet file namedStates_2.xlsx
with two worksheets.Sheet 1
of States_2.xlsx contained the following dataSheet 2
of States_2.xlsx contained the following dataCreate a table in SQL Server named dbo.Destination using the below create script. Excel sheet data will be inserted into this table.
The table is currently empty.
Create a new SSIS package and on the package, create the following 4 variables. FolderPath will contain the folder where the Excel files are stored. FilePattern will contain the extension of the files that will be looped through and this example works only for
.xlsx
. FilePath will be assigned with a value by the Foreach Loop container but we need a valid path to begin with for design time and it is currently populated with the pathF:\Temp\States_1.xlsx
of the first Excel file. SheetName will contain the actual sheet name but we need to populate with initial valueSheet1$
to avoid design time error.In the package's connection manager, create an ADO.NET connection with the following configuration and name it as ExcelSchema.
Select the provider
Microsoft Office 12.0 Access Database Engine OLE DB Provider
under .Net Providers for OleDb. Provide the file pathF:\Temp\States_1.xlsx
Click on the
All
section on the left side and set the property Extended Properties toExcel 12.0
to denote the version of Excel. Here in this case 12.0 denotesExcel 2007
. Click on the Test Connection to make sure that the connection succeeds.Create an Excel connection manager named Excel as shown below.
Create an OLE DB Connection SQL Server named
SQLServer
. So, we should have three connections on the package as shown below.We need to do the following connection string changes so that the Excel file is dynamically changed as the files are looped through.
On the connection ExcelSchema, configure the expression
ServerName
to use the variableFilePath
. Click on the ellipsis button to configure the expression.Similarly on the connection Excel, configure the expression
ServerName
to use the variableFilePath
. Click on the ellipsis button to configure the expression.On the Control Flow, place two Foreach Loop containers one within the other. The first
Foreach Loop container
named Loop files will loop through the files. The secondForeach Loop container
will through the sheets within the container. Within the inner For each loop container, place a Data Flow Task that will read the Excel files and load data into SQLConfigure the first Foreach loop container named Loop files as shown below:
Configure the first Foreach loop container named Loop sheets as shown below:
Inside the data flow task, place an Excel Source, Derived Column and OLE DB Destination as shown below:
Configure the Excel Source to read the appropriate Excel file and the sheet that is currently being looped through.
Configure the derived column to create new columns for file name and sheet name. This is just to demonstrate this example but has no significance.
Configure the OLE DB destination to insert the data into the SQL table.
Below screenshot shows successful execution of the package.
Below screenshot shows that data from the 4 workbooks in 2 Excel spreadsheets that were creating in the beginning of this answer is correctly loaded into the SQL table dbo.Destination.
我遇到了一篇文章,该文章说明了一种方法,可以将同一 Excel 工作表中的数据导入到所选表中,直到 excel 中的数据类型没有任何修改。
如果数据被插入或被新数据覆盖,导入过程将成功完成,并且数据将被添加到 SQL 数据库中的表中。
该文章可以在这里找到:http://www.sqlshack.com/ using-ssis-packages-import-ms-excel-data-database/
希望有帮助。
I ran into an article that illustrates a method where the data from the same excel sheet can be imported in the selected table until there is no modifications in excel with data types.
If the data is inserted or overwritten with new ones, importing process will be successfully accomplished, and the data will be added to the table in SQL database.
The article may be found here: http://www.sqlshack.com/using-ssis-packages-import-ms-excel-data-database/
Hope it helps.
我遇到了类似的问题,发现尽快删除 Excel 文件要简单得多。作为包的第一步,我使用 Powershell 将 Excel 文件中的数据提取到 CSV 文件中。我自己的 Excel 文件很简单,但这里
使用 PowerShell 提取所有 Excel 工作表并将其转换为 CSV 文件
是 Tim Smith 撰写的一篇出色的文章,介绍了从多个 Excel 文件和/或多个文件中提取数据床单。
Excel 文件转换为 CSV 后,数据导入就简单多了。
I had a similar issue and found that it was much simpler to to get rid of the Excel files as soon as possible. As part of the first steps in my package I used Powershell to extract the data out of the Excel files into CSV files. My own Excel files were simple but here
Extract and convert all Excel worksheets into CSV files using PowerShell
is an excellent article by Tim Smith on extracting data from multiple Excel files and/or multiple sheets.
Once the Excel files have been converted to CSV the data import is much less complicated.