当行依赖于外键值时如何使用 BULK INSERT?

发布于 2024-12-19 07:25:10 字数 2729 浏览 3 评论 0原文

我的问题与 我在 ServerFault 上问的这个问题有关< /a>.

基于此,我考虑使用 BULK INSERT。我现在明白,我必须为要保存到数据库中的每个实体准备一个文件。不管怎样,我仍然想知道这个 BULK INSERT 是否会避免我的系统上的内存问题,如 ServerFault 上引用的问题中所述。

至于 Streets 表,非常简单!我只需要关心两个城市和五个部门作为外键。那么,地址又如何呢?地址表的结构如下:

AddressId int not null identity(1,1) primary key
StreetNumber int null
NumberSuffix_Value int not null DEFAULT 0
StreetId int null references Streets (StreetId)
CityId int not null references Cities (CityId)
SectorId int null references Sectors (SectorId)

正如我在 ServerFault 上所说,我有大约 35,000 个地址要插入。我要记住所有的ID吗? =P

然后,我现在要插入与地址有关联的公民。

PersonId int not null indentity(1,1) primary key
Surname nvarchar not null
FirstName nvarchar not null
IsActive bit
AddressId int null references Addresses (AddressId)

我唯一能想到的就是强制 ID 为静态值,但是这样一来,我就失去了之前使用 INSERT..SELECT 策略所拥有的灵活性。

那么我的选择是什么?

  1. 我强制 ID 始终相同,然后我必须SET IDENTITY_INSERT ON,以便我可以将值强制写入表中,这样我的每个 ID 始终具有相同的 ID行正如此处建议的那样。

  2. 如何使用外键批量插入?我在任何地方都找不到这方面的任何文档。 =(

感谢您的帮助!

编辑

我进行了编辑,以便包含最终适合我的 BULK INSERT SQL 指令!

我已准备好 Excel 工作簿,其中包含我需要插入的信息。因此,我简单地创建了一些补充工作表并开始编写公式,以便将信息数据“导入”到这些新工作表中。我的每个实体都有一个。

  1. 街道;
  2. 地址;
  3. 公民们。

至于另外两个实体,不值得批量插入,因为我只有两个城市和五个部门(城市细分)要插入。一旦插入城市和扇区,我记下它们各自的 ID,并开始准备我的记录集以进行批量插入。顺便说一句,使用 Excel 的强大功能来计算值并“导入”外键本身就是一种魅力。之后,我将每个工作表保存到一个单独的 CSV 文件中。然后我的记录就准备好批量化了。

USE [DatabaseName]
GO

delete from Citizens
delete from Addresses
delete from Streets

BULK INSERT Streets
    FROM N'C:\SomeFolder\SomeSubfolder\Streets.csv'
    WITH (
        FIRSTROW = 2
        , KEEPIDENTITY
        , FIELDTERMINATOR = N','
        , ROWTERMINATOR = N'\n'
        , CODEPAGE = N'ACP'
    )
GO
  • 首先

    指示开始插入的行号。在我的情况下,我的 CSV 包含列标题,因此第二行是开始的行。除此之外,人们可能想从文件中的任何位置开始,比如说第 15 行。

  • 保留身份

    即使表具有标识列,也允许批量插入指定的文件内实体 ID。当您希望使用精确 ID 进行插入时,此参数与插入行之前的 SET INDENTITY_INSERT my_table ON 相同。

至于其他参数,他们自己说。

现在已经解释了这一点,对其余两个实体重复相同的代码以插入地址和公民。由于指定了 KEEPIDENTITY,尽管我的主键已在 SQL Server 中设置为身份,但我的所有外键仍然保持不变。

不过,只需进行一些调整,与 marc_s 在他的回答中所说的完全相同,只需尽快将数据导入到暂存表中,完全没有任何限制。这样,您将使您的生活变得更加轻松,同时遵循良好的做法。 =)

My question is related to this one I asked on ServerFault.

Based on this, I've considered the use of BULK INSERT. I now understand that I have to prepare myself a file for each entities I want to save into the database. No matter what, I still wonder whether this BULK INSERT will avoid the memory issue on my system as described in the referenced question on ServerFault.

As for the Streets table, it's quite simple! I have only two cities and five sectors to care about as the foreign keys. But then, how about the Addresses? The Addresses table is structured like this:

AddressId int not null identity(1,1) primary key
StreetNumber int null
NumberSuffix_Value int not null DEFAULT 0
StreetId int null references Streets (StreetId)
CityId int not null references Cities (CityId)
SectorId int null references Sectors (SectorId)

As I said on ServerFault, I have about 35,000 addresses to insert. Shall I memorize all the IDs? =P

And then, I now have the citizen people to insert who have an association with the addresses.

PersonId int not null indentity(1,1) primary key
Surname nvarchar not null
FirstName nvarchar not null
IsActive bit
AddressId int null references Addresses (AddressId)

The only thing I can think of is to force the IDs to static values, but then, I lose any flexibility that I had with my former approach with the INSERT..SELECT stategy.

What are then my options?

  1. I force the IDs to be always the same, then I have to SET IDENTITY_INSERT ON so that I can force the values into the table, this way I always have the same IDs for each of my rows just as suggested here.

  2. How to BULK INSERT with foreign keys? I can't get any docs on this anywhere. =(

Thanks for your kind assistance!

EDIT

I edited in order to include the BULK INSERT SQL instruction that finally made it for me!

I had my Excel workbook ready with the information I needed to insert. So, I simply created a few supplemental worksheet and began to write formulas in order to "import" the information data to these new sheets. I had one for each of my entities.

  1. Streets;
  2. Addresses;
  3. Citizens.

As for the two other entities, it wasn't worthy to bulk insert them, as I had only two cities and five sectors (cities subdivisions) to insert. Once the both the cities and sectors inserted, I noted their respective IDs and began to ready my record sets for bulk insert. Using the power of Excel to compute the values and to "import" the foreign keys was a charm of itself, by the way. Afterwards, I have saved each of the worksheets to a separated CSV file. My records were then ready to bulked.

USE [DatabaseName]
GO

delete from Citizens
delete from Addresses
delete from Streets

BULK INSERT Streets
    FROM N'C:\SomeFolder\SomeSubfolder\Streets.csv'
    WITH (
        FIRSTROW = 2
        , KEEPIDENTITY
        , FIELDTERMINATOR = N','
        , ROWTERMINATOR = N'\n'
        , CODEPAGE = N'ACP'
    )
GO
  • FIRSTROW

    Indicates the row number at which to begin the insert. In my situation, my CSVs contained the column headers, so the second row was the one to begin with. Aside, one could possibly want to start anywhere in his file, let's say the 15th row.

  • KEEPIDENTITY

    Allows one to bulk-insert specified in-file entity IDs even though the table has an identity column. This parameter is the same as SET INDENTITY_INSERT my_table ON before a row insert when you wish to insert with a precise id.

As for the other parameters, they speak by themselves.

Now that this is explained, the same code was repeated for each of the two remaining entities to insert Addresses and Citizens. And because the KEEPIDENTITY was specified, all of my foreign keys remained still, though my primary keys were set as identities in SQL Server.

Only a few tweaks though, just the exact same thing as marc_s said in his answer, just import your data as fast as you can into a staging table with no restriction at all. This way, you're gonna make your life much easier, while following good practices nevertheless. =)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

温柔女人霸气范 2024-12-26 07:25:10

基本思想是将数据批量插入到没有任何限制、约束等的临时表中 - 只需尽可能快地批量加载数据即可。

一旦临时表中有数据,然后当您将临时表中的数据插入实际表时,您需要开始担心约束等。

在这里,您可以例如

  • 仅将那些符合所有条件的行插入到您的真实工作表中(并在临时表中将它们标记为“成功插入”)

  • 处理临时表中剩余的、因某些错误/恢复过程而未成功插入的所有行 - 无论可能是什么:打印一份包含所有内容的报告“问题”行,将它们扔进“错误箱”或其他任何东西 - 完全取决于你。

关键点是:实际的BULK INSERT应该进入一个完全不受约束的表 - 只需尽可能快地加载数据 - 然后在第二步中开始担心约束以及查找数据和引用和类似的东西

The basic idea is to bulk insert your data into a staging table that doesn't have any restrictions, any constraints etc. - just bulk load the data as fast as you can.

Once you have the data in the staging table, then you need to start to worry about constraints etc. when you insert the data from the staging table into the real tables.

Here, you could e.g.

  • insert only those rows into your real work tables that match all the criteria (and mark them as "successfully inserted" in your staging table)

  • handle all rows that are left in the staging table that aren't successfully inserted by some error / recovery process - whatever that could be: printing a report with all the "problem" rows, tossing them into an "error bin" or whatever - totally up to you.

Key point is: the actual BULK INSERT should be into a totally unconstrained table - just load the data as fast as you can - and only then in a second step start to worry about constraints and lookup data and references and stuff like that

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文