如何使用 SSIS 包循环遍历 Excel 文件并将其加载到数据库中?

发布于 2024-12-04 14:29:13 字数 512 浏览 0 评论 0原文

我需要创建一个 SSIS 包,用于将多个 Excel 文件中的数据导入到 SQL 数据库中。我计划使用嵌套的 Foreach 循环容器来实现此目的。一个 Foreach 文件枚举器,并嵌套在其中,一个 Foreach ADO.net 架构行集枚举器

要考虑的问题:Excel 文件之间的工作表名称不同,但结构保持不变。

我已创建 Excel 连接管理器,但架构行集枚举器不接受枚举器配置中的连接管理器。

经过研究,我发现您可以使用 Jet Ole db 提供程序连接到 Excel 文件。但是,我只能指定 Microsoft Access 数据库文件作为数据源。尝试插入 Excel 文件作为数据源失败

经过更多研究,我发现您可以使用带有连接字符串而不是 DSN 的 Odbc 数据提供程序。插入指定 Excel 文件的连接字符串后,这也失败了,

我被告知不要使用脚本任务来完成此操作,即使在尝试通过索引访问工作表的最后努力从工作表中提取数据之后,我发现索引不同Excel文件中的工作表是不同的

任何帮助将不胜感激

I need to create an SSIS package for importing data from multiple Excel files into an SQL database. I plan on using nested Foreach Loop containers to achieve this. One Foreach File Enumerator and nested within that, a Foreach ADO.net Schema Rowset Enumerator

Problem to consider: Sheet names are different between excel files but structure remains the same.

I have created an Excel Connection Manager, but the Schema Rowset Enumerator is not accepting the connection manager in the Enumerator configuration.

After researching, I found that you can use the Jet Ole db provider to connect to an excel file. However, I can only specify Microsoft Access Database Files as the data source. Attempting to insert an Excel File as the data source fails

After more research I found that you can use the Odbc Data Provider with a connection string instead of a DSN. After inserting a connection string specifying the Excel file this also failed

I have been told not to use a Script Task to accomplish this and even after trying a last ditch effort to extract data from sheets be accessing the sheets by index I found that the index for the sheets in the different excel files are different

Any help would be greatly appreciated

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

寻梦旅人 2024-12-11 14:29:13

这是一种可能的方法,假设 Excel 文件中不存在任何空白工作表,并且所有工作表都遵循完全相同的结构。另外,假设文件扩展名仅为 .xlsx

以下示例是使用 SSIS 2008 R2Excel 2007 创建的。本示例的工作文件夹为 F:\Temp\

在文件夹路径 F:\Temp\ 中,创建一个名为 States_1.xlsx< 的 Excel 2007 电子表格文件/code> 有两个工作表。

States_1.xlsx工作表 1 包含以下数据

States_1_Sheet_1

< States_1.xlsx 的 code>Sheet 2 包含以下数据

States_1_Sheet_2

在文件夹路径 F:\Temp\ 中,创建另一个名为 States_2.xlsx 的 Excel 2007 电子表格文件,其中包含两个工作表。

States_2.xlsx工作表 1 包含以下数据

States_2_Sheet_1

< States_2.xlsx 的 code>Sheet 2 包含以下数据

States_2_Sheet_2

使用以下创建脚本在 SQL Server 中创建一个名为 dbo.Destination 的表。 Excel 工作表数据将插入到该表中。

CREATE TABLE [dbo].[Destination](
    [Id] [int] IDENTITY(1,1) NOT NULL,
    [State] [nvarchar](255) NULL,
    [Country] [nvarchar](255) NULL,
    [FilePath] [nvarchar](255) NULL,
    [SheetName] [nvarchar](255) NULL,
CONSTRAINT [PK_Destination] PRIMARY KEY CLUSTERED ([Id] ASC)) ON [PRIMARY]
GO

该表当前为空。

Empty table

创建一个新的 SSIS 包,并在该包上创建以下 4 个变量。 FolderPath 将包含存储 Excel 文件的文件夹。 FilePattern 将包含将循环通过的文件的扩展名,此示例仅适用于 .xlsxFilePath 将由 Foreach 循环容器分配一个值,但我们需要一个有效的路径来开始设计,并且当前使用路径 F:\Temp\States_1.xlsx 填充该路径第一个 Excel 文件的SheetName 将包含实际的工作表名称,但我们需要填充初始值 Sheet1$ 以避免设计时错误。

Variables

在包的连接管理器中,使用以下配置创建 ADO.NET 连接并将其命名为 ExcelSchema< /强>。

在 .Net OleDb 提供程序下选择提供程序 Microsoft Office 12.0 Access Database Engine OLE DB Provider。提供文件路径 F:\Temp\States_1.xlsx

ExcelSchema 1

单击 All 部分左侧并将属性 Extended Properties 设置为 Excel 12.0 以表示 Excel 的版本。在本例中,12.0 表示 Excel 2007。单击“测试连接”以确保连接成功。

ExcelSchema 2

创建一个名为 Excel 的 Excel 连接管理器,如下所示。

Excel

创建名为 SQLServer 的 OLE DB 连接 SQL Server。因此,我们应该在封装上有三个连接,如下所示。

Connections

我们需要进行以下连接字符串更改,以便 Excel 文件在文件循环时动态更改。

在连接 ExcelSchema 上,配置表达式 ServerName 以使用变量 FilePath。单击省略号按钮以配置表达式。

ExcelSchema ServerName

同样,在连接 Excel 上,配置表达式 ServerName使用变量FilePath。单击省略号按钮以配置表达式。

Excel ServerName

在控制流上,将两个 Foreach 循环容器依次放置。第一个名为 Loop 文件的 Foreach Loop 容器 将循环访问这些文件。第二个 Foreach 循环容器 将遍历容器内的工作表。在内部 For 每个循环容器中,放置一个数据流任务,该任务将读取 Excel 文件并将数据加载到 SQL

Control Flow

配置名为 Loop files 的第一个 Foreach 循环容器,如下所示:

Foreach Loop 1 Collection

Foreach Loop 1 Variable Mappings

配置名为 Loop Sheets 的第一个 Foreach 循环容器,如下所示:

Foreach 循环 2 集合

Foreach 循环2 变量映射

在数据流任务内,放置 Excel 源、派生列和 OLE DB 目标,如下所示:

配置 Excel 源以读取相应的 Excel 文件和当前正在循环的工作表。

Excel 源连接管理器

 Excel 源列

配置派生列以为文件名和工作表名称创建新列。这只是为了演示这个例子,没有任何意义。

派生列

配置 OLE DB 目标以将数据插入 SQL 表中。

OLE DB 目标连接管理器

“OLE

下面的屏幕截图显示了包的成功执行。

执行成功

下面的屏幕截图显示,在本答案开头创建的 2 个 Excel 电子表格中的 4 个工作簿中的数据是正确的加载到 SQL 表 dbo.Destination 中。

SQL 表

Here is one possible way of doing this based on the assumption that there will not be any blank sheets in the Excel files and also all the sheets follow the exact same structure. Also, under the assumption that the file extension is only .xlsx

Following example was created using SSIS 2008 R2 and Excel 2007. The working folder for this example is F:\Temp\

In the folder path F:\Temp\, create an Excel 2007 spreadsheet file named States_1.xlsx with two worksheets.

Sheet 1 of States_1.xlsx contained the following data

States_1_Sheet_1

Sheet 2 of States_1.xlsx contained the following data

States_1_Sheet_2

In the folder path F:\Temp\, create another Excel 2007 spreadsheet file named States_2.xlsx with two worksheets.

Sheet 1 of States_2.xlsx contained the following data

States_2_Sheet_1

Sheet 2 of States_2.xlsx contained the following data

States_2_Sheet_2

Create a table in SQL Server named dbo.Destination using the below create script. Excel sheet data will be inserted into this table.

CREATE TABLE [dbo].[Destination](
    [Id] [int] IDENTITY(1,1) NOT NULL,
    [State] [nvarchar](255) NULL,
    [Country] [nvarchar](255) NULL,
    [FilePath] [nvarchar](255) NULL,
    [SheetName] [nvarchar](255) NULL,
CONSTRAINT [PK_Destination] PRIMARY KEY CLUSTERED ([Id] ASC)) ON [PRIMARY]
GO

The table is currently empty.

Empty table

Create a new SSIS package and on the package, create the following 4 variables. FolderPath will contain the folder where the Excel files are stored. FilePattern will contain the extension of the files that will be looped through and this example works only for .xlsx. FilePath will be assigned with a value by the Foreach Loop container but we need a valid path to begin with for design time and it is currently populated with the path F:\Temp\States_1.xlsx of the first Excel file. SheetName will contain the actual sheet name but we need to populate with initial value Sheet1$ to avoid design time error.

Variables

In the package's connection manager, create an ADO.NET connection with the following configuration and name it as ExcelSchema.

Select the provider Microsoft Office 12.0 Access Database Engine OLE DB Provider under .Net Providers for OleDb. Provide the file path F:\Temp\States_1.xlsx

ExcelSchema 1

Click on the All section on the left side and set the property Extended Properties to Excel 12.0 to denote the version of Excel. Here in this case 12.0 denotes Excel 2007. Click on the Test Connection to make sure that the connection succeeds.

ExcelSchema 2

Create an Excel connection manager named Excel as shown below.

Excel

Create an OLE DB Connection SQL Server named SQLServer. So, we should have three connections on the package as shown below.

Connections

We need to do the following connection string changes so that the Excel file is dynamically changed as the files are looped through.

On the connection ExcelSchema, configure the expression ServerName to use the variable FilePath. Click on the ellipsis button to configure the expression.

ExcelSchema ServerName

Similarly on the connection Excel, configure the expression ServerName to use the variable FilePath. Click on the ellipsis button to configure the expression.

Excel ServerName

On the Control Flow, place two Foreach Loop containers one within the other. The first Foreach Loop container named Loop files will loop through the files. The second Foreach Loop container will through the sheets within the container. Within the inner For each loop container, place a Data Flow Task that will read the Excel files and load data into SQL

Control Flow

Configure the first Foreach loop container named Loop files as shown below:

Foreach Loop 1 Collection

Foreach Loop 1 Variable Mappings

Configure the first Foreach loop container named Loop sheets as shown below:

Foreach Loop 2 Collection

Foreach Loop 2 Variable Mappings

Inside the data flow task, place an Excel Source, Derived Column and OLE DB Destination as shown below:

Data Flow Task

Configure the Excel Source to read the appropriate Excel file and the sheet that is currently being looped through.

Excel Source Connection Manager

Excel Source Columns

Configure the derived column to create new columns for file name and sheet name. This is just to demonstrate this example but has no significance.

Derived column

Configure the OLE DB destination to insert the data into the SQL table.

OLE DB Destination Connection Manager

OLE DB Destination Columns

Below screenshot shows successful execution of the package.

Execution successful

Below screenshot shows that data from the 4 workbooks in 2 Excel spreadsheets that were creating in the beginning of this answer is correctly loaded into the SQL table dbo.Destination.

SQL table

向地狱狂奔 2024-12-11 14:29:13

我遇到了一篇文章,该文章说明了一种方法,可以将同一 Excel 工作表中的数据导入到所选表中,直到 excel 中的数据类型没有任何修改。

如果数据被插入或被新数据覆盖,导入过程将成功完成,并且数据将被添加到 SQL 数据库中的表中。

该文章可以在这里找到:http://www.sqlshack.com/ using-ssis-packages-import-ms-excel-data-database/

希望有帮助。

I ran into an article that illustrates a method where the data from the same excel sheet can be imported in the selected table until there is no modifications in excel with data types.

If the data is inserted or overwritten with new ones, importing process will be successfully accomplished, and the data will be added to the table in SQL database.

The article may be found here: http://www.sqlshack.com/using-ssis-packages-import-ms-excel-data-database/

Hope it helps.

梦与时光遇 2024-12-11 14:29:13

我遇到了类似的问题,发现尽快删除 Excel 文件要简单得多。作为包的第一步,我使用 Powershell 将 Excel 文件中的数据提取到 CSV 文件中。我自己的 Excel 文件很简单,但这里

使用 PowerShell 提取所有 Excel 工作表并将其转换为 CSV 文件

是 Tim Smith 撰写的一篇出色的文章,介绍了从多个 Excel 文件和/或多个文件中提取数据床单。

Excel 文件转换为 CSV 后,数据导入就简单多了。

I had a similar issue and found that it was much simpler to to get rid of the Excel files as soon as possible. As part of the first steps in my package I used Powershell to extract the data out of the Excel files into CSV files. My own Excel files were simple but here

Extract and convert all Excel worksheets into CSV files using PowerShell

is an excellent article by Tim Smith on extracting data from multiple Excel files and/or multiple sheets.

Once the Excel files have been converted to CSV the data import is much less complicated.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文