如何合并具有自动编号主键的表?
我想每个人都会偶尔遇到这个问题:您有两个具有需要合并的自动编号主键的表。使用自动编号主键来支持应用程序生成的键有很多充分的理由,但与其他表合并肯定是最大的缺点之一。
出现的一些问题是 ID 重叠和外键不同步。我想听听您解决这个问题的方法。我总是遇到问题,所以我很好奇是否有人有某种通用的解决方案。
-- 编辑--
针对建议使用 guid 或其他非数字键的答案,在某些情况下,提前使用自动编号键似乎是一个更好的主意(稍后您会后悔这一点) ),或者您正在接管别人的项目,或者您获得了一些必须使用的遗留数据库。所以我真的在寻找一种您无法再控制数据库设计的解决方案。
I suppose everyone runs into this problem once in a while: you have two tables that have autonumber primary keys that need to be merged. There are many good reasons why autonumber primary keys are used in favour of say application-generated keys, but merging with other tables must be one of the biggest drawbacks.
Some problems that arise are overlapping ids and out of sync foreign keys. I would like to hear your approach for tackling this. I always run into problems, so I'm very curious if anybody has some sort of a general solution.
-- EDIT --
In response to the answers suggesting to use guids or other non-numeric keys, there are situations where in advance it just seems a better idea to use autonumber keys (and you regret this later), or you're taking over someone else's project, or you get some legacy database that you have to work with. So I'm really looking for a solution where you have no control over the database design anymore.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
解决方案包括:
使用 GUID 作为主键而不是更简单的身份字段。很有可能避免重叠,但 GUID 更难使用,并且不能很好地与聚集索引配合使用。
将主键设为多列键,第二列通过识别合并数据的来源来解决重叠值。可移植,与聚集索引配合使用效果更好,但开发人员讨厌多列键。
使用自然键而不是伪键。
为合并表之一分配新的主键值,并将这些更改级联到任何相关行。这将合并操作更改为 ETL 操作。如果您无法更改数据库设计,这是可用于遗留数据的唯一解决方案。
我不确定是否存在一刀切的解决方案。根据情况选择其中之一。
Solutions include:
Use GUIDs as primary keys instead of a simpler identity field. Very likely to avoid overlaps, but GUIDs are harder to use and don't play nicely with clustered indexes.
Make the primary key into a multi-column key, the second column resolving overlapping values by identifying the source of the merged data. Portable, works better with clustered indexes, but developers hate multi-column keys.
Use natural keys instead of pseudokeys.
Allocate new primary key values for one of the merged tables, and cascade these changes to any dependent rows. This changes a merge operation into an ETL operation. This is the only solution you can use for legacy data, if you can't change the database design.
I'm not sure there's a one-size-fits-all solution. Choose one of these based on the situation.
嗯,我对我刚刚在 AlexKuznetsov 的答案中发表评论的想法很感兴趣,所以我将对此做出完整的回答。
假设表被命名为 table1 和 table2,其中 id1 和 id2 作为自动编号主键。它们将被合并到带有 id3(非自动编号主键)的 table3 中。
为什么不:
UPDATE table SET id1 = id1 * 2
,对于引用 table2 的 FK 字段,执行 < code>UPDATE table SET id2 = (id2) * 2 + 1它甚至可以使用 3 个或更多表,只需使用更高的乘数即可。
Hm, I'm kind of enthousiastic about the idea that I just put in a comment at AlexKuznetsov's answer, so I'll make a whole answer about it.
Consider the tables to be named table1 and table2, with id1 and id2 as autonumber primary keys. They will be merged to table3 with id3 (a non-autonumber primary key).
Why not:
UPDATE table SET id1 = id1 * 2
, and for FK fields referring to table2, execute anUPDATE table SET id2 = (id2) * 2 + 1
INSERT INTO table3 SELECT id1 * 2 AS id3, ... FROM table1 UNION ALL SELECT id2 * 2 + 1 AS id3 FROM table2
It can even work with 3 or more tables, just by using a higher multiplier.
针对这种可能性进行设计的标准方法之一(如果不是标准)是使用 GUID 作为主键而不是整数 - 这样合并就相对轻松,因为您可以保证不要遇到重叠。
除非重新设计,否则我认为您必须插入到表中,接受您将获得新的主键,并确保维护从旧 ID 到新 ID 的映射 - 然后插入引用数据如果您的数据有一个在插入后保持唯一的“业务键”,那么这将节省跟踪映射的时间。
One of the standard approaches (if not the standard approach), where you're designing for such an eventuality, is to use GUIDs for primary keys rather than integers - merging is then relatively painless as you are guaranteed not to encounter an overlap.
Barring a redesign, tho', I think you're stuck with having to insert into the table, accept that you'll get new primary keys, and ensure that you maintain the mapping from old-to-new ID - then insert referencing data with FK remapped etc. etc. If you data has a "business key" that will remain unique after the insert, this would save on having to keep track of the mapping.
如果您确定只有两个这样的表,则可以在一个表中使用偶数 ID(0,2,4,6,...),在另一个表中使用奇数 ID(1,3,5,7,...) )
I fyou are sure you have only two such tables, you can just have even IDs in one table (0,2,4,6,...) and odd IDs in another (1,3,5,7,...)
假设您在要合并的表中也有一个自然键,那么该过程并不困难。自然键用于删除重复数据并正确地重新分配任何引用。您可以随时对代理键值重新编号 - 这是使用代理项的主要优点之一。
因此,我不认为这是代理键的问题 - 前提是您始终强制使用自然键(实际上我更喜欢术语“业务密钥”)。如果您还没有这些表的业务键,那么现在可能是重新设计的好时机,以便正确实现所有必要的键。
Assuming you also have a natural key in the tables to be merged then the process isn't difficult. The natural key is used to deduplicate and to correctly reassign any references. You can renumber the surrogate key values at any time - that being one of the principal advantages of using a surrogate in the first place.
So I don't see this as a problem with surrogate keys - provided you always enforce the natural key (actually I much prefer the term "business key"). If you haven't got business keys for these tables, well maybe now would be a good time to redesign so that ALL the necessary keys are properly implemented.
下面是它在我们的 MssqlMerge 中的实现方式,类似于 @WillA 建议的“维护从旧 ID 到新 ID 的映射”
假设我们有两个表:
现在,当我们将 [Parent] 行从一个数据库合并到另一个数据库时,我们执行以下步骤
:在目标数据库中创建包含列(SourceID、TargetID)的映射表
b. INSERT [Parent] 行包含源表中的数据,不指定 ID:
c。 INSERT [Child] 行,从映射中获取 [ParentID]:
过滤器可以应用于父表行。子行将被筛选为仅包含具有与父表筛选器相对应的 ParentID 的记录。更进一步,我们可能有一个类似 Grandchild 的表,在 ChildID 上引用 Child,并且我们也可以选择导入 Grandchild 行,仅过滤具有与所选父级相关的 ChildID 的行。依此类推,对于任何外键关系深度,每个表都与其自己的映射合并。
合并后映射表可以保留在目标数据库中,并且可用于许多目标:
有关从属表的信息取自 sys.foreign_keys 和 sys.foreign_key_columns。
目前这种方法的主要限制是它不能用于参与循环外键的表(例如A同时引用B和B同时引用A)。有一种方法可以解决这个问题,即打破循环 - 忽略其中一个外键,然后在 INSERT 完成后更新它们,但这是进一步改进的主题。
Here is how it is implemented in our MssqlMerge, kind of "maintain the mapping from old-to-new ID" suggested by @WillA
Let's say we have two tables:
Now when we merge [Parent] rows to from one db to another, we do the following steps:
a. in target DB create mapping table with columns (SourceID, TargetID)
b. INSERT [Parent] row with data from source table, without specifying ID:
c. INSERT [Child] row, taking [ParentID] from mapping:
Filters can be applied to the Parent table rows. Child rows are filtered to include only records with ParentID corresponding to the Parent table filter. Going further, we may have a table like Grandchild referencing Child on ChildID and we may select to import Grandchild rows as well, filtering only rows having ChildID related to selected parents. And so on for any foreign key relations depth, each table is merged with its own mapping.
Mapping tables can be left in the target database after merge and can be useful for a number of goals:
Information about dependent tables is taken from sys.foreign_keys and sys.foreign_key_columns.
Currently the major limitation of this approach is that it can not be used for tables participating in cycled foreign keys (like A references B and B references A at the same time). There is the way around this by breaking the cycle - ignoring one of the foreign keys and then UPDATEing them after INSERTs completion, but that's the subject of further improvements.