如何正确保留身份值以供数据库中使用?

发布于 2024-09-06 02:34:58 字数 271 浏览 5 评论 0原文

我们有一些代码需要在 SQL 中维护我们自己的身份 (PK) 列。我们有一个表,我们在其中批量插入数据,但是我们在批量插入完成之前将数据添加到相关表中,因此我们无法使用 IDENTITY 列并预先找出值。

当前代码正在选择字段的 MAX 值并将其递增 1。尽管应用程序的两个实例同时运行的可能性极小,但它仍然不是线程安全的(更不用说它每次都会进入数据库)。

我正在使用 ADO.net 实体模型。我将如何“保留”一系列 id 来使用,当该范围用完时,获取一个新块来使用,并保证不会使用相同的范围。

We have some code in which we need to maintain our own identity (PK) column in SQL. We have a table in which we bulk insert data, but we add data to related tables before the bulk insert is done, thus we can not use an IDENTITY column and find out the value up front.

The current code is selecting the MAX value of the field and incrementing it by 1. Although there is a highly unlikely chance that two instances of our application will be running at the same time, it is still not thread-safe (not to mention that it goes to the database everytime).

I am using the ADO.net entity model. How would I go about 'reserving' a range of id's to use, and when that range runs out, grab a new block to use, and guarantee that the same range will not be used.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(7

半步萧音过轻尘 2024-09-13 02:35:10

最通用的解决方案是生成永远不会与数据库标识符交叉的客户端标识符 - 通常它是负值,然后用数据库在插入时生成的标识符更新标识符。

这种方式在许多用户同时插入数据的应用程序中使用是安全的。除 GUID 之外的任何其他方式都不是多用户安全的。

但是,如果您遇到这种罕见的情况,即在将实体保存到数据库之前需要知道实体的主键,并且无法使用 GUID,则可以使用防止标识符重叠的标识符生成算法。
最简单的是为每个连接的客户端分配一个唯一的标识符前缀,并将其添加到该客户端生成的每个标识符的前面。

如果您使用的是 ADO.NET Entity Framework,您可能不必担心标识符的生成:EF 会自行生成标识符,只需将实体的主键标记为 IsDbGenerate=true 即可。

严格来说,实体框架与其他 ORM 一样不需要尚未保存到数据库的对象标识符,它足以正确操作新实体的对象引用。仅在更新/删除实体以及更新/删除/插入引用新实体的实体时才需要实际主键值,ei 在实际主键值即将写入数据库的情况下。如果实体是新的,则在新实体未保存到数据库之前不可能保存引用新实体的其他实体,并且 ORM 维护考虑引用映射的实体保存的特定顺序。

The most general solution is generate client identifiers that never across with database identifiers - usually it is negative values, then update identifiers with identifier generated by database on inserting.

This way is safe to use in application with many users inserts the data simultaneously. Any other ways except GUIDs are not multiuser-safe.

But if you have that rare case when entity's primary key is required to be known before entity is saved to database, and it is impossible to use GUID, you may use identifier generation algorithm which are prevent identifier overlapping.
The most simple is assigning a unique identifier prefix for each connected client, and prepend it to each identifier generated by this client.

If you are using ADO.NET Entity Framework, you probably should not worry about identifier generation: EF generates identifiers by itself, just mark primary key of the entity as IsDbGenerated=true.

Strictly saying, entity framework as other ORM does not require identifier for objects are not saved to database yet, it is enought object reference for correctly operating with new entities. Actual primary key value is required only on updating/deleting entity, and on updating/deleting/inserting entity that references new entity, e.i. in cases when actual primary key value is about to be written in database. If entity is new, it is impossible to save other entites that are referenced new entity until new entity is not saved to database, and ORMs maintains specific order of entities saving which take references map into account.

南冥有猫 2024-09-13 02:35:09

如果您有很多子表,您可能不想更改 PK。另外,整数字段可能在连接中表现更好。但您仍然可以添加 GUID 字段并使用预先生成的值将其填充到批量插入中。然后,您可以单独保留标识插入(关闭它几乎是一个坏主意)并使用您预先生成的 GUID 值来取回刚刚插入到子表中的标识值。

如果您使用常规的基于集的插入(带有 select 子句而不是 value 子句)而不是批量插入,并且您使用的是 SQL Server 2008,则可以使用输出子句来获取行的标识。

I fyou have a lot of child tables you might not want to change the PK. PLus the integer filedsa relikely to perform better in joins. But you could still add a GUID field and populate it in the bulk insert with pre-generated values. Then you could leave the identity insert alone (almost alawys a bad idea to turn it off) and use the GUID values you pre-generated to get back the Identity values you just inserted for the insert into child tables.

If you use a regular set-based insert (one with the select clause instead of the values clause) instead of a bulk insert, you could use the output clause to get the identities back for the rows if you are using SQL Server 2008.

余罪 2024-09-13 02:35:08

两个客户端可以保留相同的 ID 块。

除了通过锁定来序列化插入之外,没有其他解决方案。

请参阅 MSDN 中的锁定提示

Two clients could reserve the same block of id's.

There is no solution short of serializing your inserts by locking.

See Locking Hints in MSDN.

等风也等你 2024-09-13 02:35:06

您可能对 Hi/Lo 算法感兴趣:

什么是 Hi/Lo 算法?

The Hi/Lo algorithm may be of interest to you:

What's the Hi/Lo algorithm?

情绪失控 2024-09-13 02:35:04

为什么使用 ADO.net 实体框架来完成听起来像 ETL 的工作? (请参阅下面对 ADO.NET 实体框架和 ORM 的一般批评。它是免费的)。

为什么要使用整数?使用唯一标识符可以解决“应用程序运行的多个实例”问题。

使用 uniqueidentifier 作为列默认值会比使用 int IDENTITY 慢...生成 guid 比 int 需要更多时间。 guid(16 字节)也比 int(4 字节)大。首先尝试一下,如果它产生了可接受的性能,请使用它。

如果在插入的每一行上生成 guid 所带来的延迟是不可接受的,请批量创建 guid(或在另一台服务器上)并将它们缓存在表中。

示例 TSQL 代码:

CREATE TABLE testinsert
 (
  date_generated datetime   NOT NULL DEFAULT GETDATE(), 
  guid   uniqueidentifier NOT NULL, 
  TheValue  nvarchar(255)  NULL
 )
GO

CREATE TABLE guids 
 (
  guid   uniqueidentifier NOT NULL DEFAULT newid(), 
  used   bit     NOT NULL DEFAULT 0, 
  date_generated datetime   NOT NULL DEFAULT GETDATE(), 
  date_used  datetime   NULL
 )
GO

CREATE PROCEDURE GetGuid
 @guid uniqueidentifier OUTPUT
AS
BEGIN
 SET NOCOUNT ON
 DECLARE @return int = 0

 BEGIN TRY
  BEGIN TRANSACTION
   SELECT TOP 1 @guid = guid FROM guids WHERE used = 0

   IF @guid IS NOT NULL
    UPDATE guids
    SET 
     used = 1, 
     date_used = GETDATE()
    WHERE guid = @guid
   ELSE
    BEGIN
     SET @return = -1
     PRINT 'GetGuid Error: No Unused guids are available'
    END
  COMMIT TRANSACTION
 END TRY

 BEGIN CATCH
  SET @return = ERROR_NUMBER() -- some error occurred
  SET @guid = NULL
  PRINT 'GetGuid Error: ' + CAST(ERROR_NUMBER() as varchar) + CHAR(13) + CHAR(10) + ERROR_MESSAGE()
  ROLLBACK
 END CATCH

 RETURN @return
END
GO

CREATE PROCEDURE InsertIntoTestInsert
 @TheValue nvarchar(255)
AS
 BEGIN
  SET NOCOUNT ON
  DECLARE @return int = 0

  DECLARE @guid uniqueidentifier
  DECLARE @getguid_return int

  EXEC @getguid_return = GetGuid @guid OUTPUT

  IF @getguid_return = 0 
   BEGIN
    INSERT INTO testinsert(guid, TheValue) VALUES (@guid, @TheValue)
   END
  ELSE
   SET @return = -1

  RETURN @return
 END
GO

-- generate the guids
INSERT INTO guids(used) VALUES (0)
INSERT INTO guids(used) VALUES (0)

--Insert data through the stored proc
EXEC InsertIntoTestInsert N'Foo 1'
EXEC InsertIntoTestInsert N'Foo 2'
EXEC InsertIntoTestInsert N'Foo 3' -- will fail, only two guids were created

-- look at the inserted data
SELECT * FROM testinsert

-- look at the guids table
SELECT * FROM guids

有趣的问题是...如何将其映射到 ADO.Net 的实体框架?

这是一个始于 ORM(对象关系映射)早期的经典问题。

如果您使用关系数据库最佳实践(绝不允许直接访问基表,只允许通过视图和存储过程进行数据操作),那么您需要添加人员数量(有能力并且愿意不仅编写数据库模式,而且还编写所有视图的人)以及形成 API 的存储过程),并给项目带来延迟(实际编写这些东西的时间)。

因此,每个人都会削减这一点,人们直接针对规范化数据库编写查询,而他们不理解这些数据库......因此需要 ORM,在本例中是 ADO.NET 实体框架。

ORM 把我吓坏了。我见过 ORM 工具生成极其低效的查询,这使原本性能良好的数据库服务器陷入瘫痪。程序员生产力所获得的成果却因最终用户的等待和 DBA 的沮丧而消失。

Why are you using ADO.net Entity Framework to do what sounds like ETL work? (See critique of ADO.NET Entity Framework and ORM in general below. It is rant free).

Why use ints at all? Using a uniqueidentifier would solve the "multiple instances of the application running" issue.

Using a uniqueidentifier as a column default will be slower than using an int IDENTITY... it takes more time to generate a guid than an int. A guid will also be larger (16 byte) than an int (4 bytes). Try this first and if it results in acceptable performance, run with it.

If the delay introduced by generating a guid on each row insert it unacceptable, create guids in bulk (or on another server) and cache them in a table.

Sample TSQL code:

CREATE TABLE testinsert
 (
  date_generated datetime   NOT NULL DEFAULT GETDATE(), 
  guid   uniqueidentifier NOT NULL, 
  TheValue  nvarchar(255)  NULL
 )
GO

CREATE TABLE guids 
 (
  guid   uniqueidentifier NOT NULL DEFAULT newid(), 
  used   bit     NOT NULL DEFAULT 0, 
  date_generated datetime   NOT NULL DEFAULT GETDATE(), 
  date_used  datetime   NULL
 )
GO

CREATE PROCEDURE GetGuid
 @guid uniqueidentifier OUTPUT
AS
BEGIN
 SET NOCOUNT ON
 DECLARE @return int = 0

 BEGIN TRY
  BEGIN TRANSACTION
   SELECT TOP 1 @guid = guid FROM guids WHERE used = 0

   IF @guid IS NOT NULL
    UPDATE guids
    SET 
     used = 1, 
     date_used = GETDATE()
    WHERE guid = @guid
   ELSE
    BEGIN
     SET @return = -1
     PRINT 'GetGuid Error: No Unused guids are available'
    END
  COMMIT TRANSACTION
 END TRY

 BEGIN CATCH
  SET @return = ERROR_NUMBER() -- some error occurred
  SET @guid = NULL
  PRINT 'GetGuid Error: ' + CAST(ERROR_NUMBER() as varchar) + CHAR(13) + CHAR(10) + ERROR_MESSAGE()
  ROLLBACK
 END CATCH

 RETURN @return
END
GO

CREATE PROCEDURE InsertIntoTestInsert
 @TheValue nvarchar(255)
AS
 BEGIN
  SET NOCOUNT ON
  DECLARE @return int = 0

  DECLARE @guid uniqueidentifier
  DECLARE @getguid_return int

  EXEC @getguid_return = GetGuid @guid OUTPUT

  IF @getguid_return = 0 
   BEGIN
    INSERT INTO testinsert(guid, TheValue) VALUES (@guid, @TheValue)
   END
  ELSE
   SET @return = -1

  RETURN @return
 END
GO

-- generate the guids
INSERT INTO guids(used) VALUES (0)
INSERT INTO guids(used) VALUES (0)

--Insert data through the stored proc
EXEC InsertIntoTestInsert N'Foo 1'
EXEC InsertIntoTestInsert N'Foo 2'
EXEC InsertIntoTestInsert N'Foo 3' -- will fail, only two guids were created

-- look at the inserted data
SELECT * FROM testinsert

-- look at the guids table
SELECT * FROM guids

The fun question is... how do you map this to ADO.Net's Entity Framework?

This is a classic problem that started in the early days of ORM (Object Relational Mapping).

If you use relational-database best practices (never allow direct access to base tables, only allow data manipulation through views and stored procedures), then you add headcount (someone capable and willing to write not only the database schema, but also all the views and stored procedures that form the API) and introduce delay (the time to actually write this stuff) to the project.

So everyone cuts this and people write queries directly against a normalized database, which they don't understand... thus the need for ORM, in this case, the ADO.NET Entity Framework.

ORM scares the heck out of me. I've seen ORM tools generate horribly inefficient queries which bring otherwise performant database servers to their knees. What was gained in programmer productivity was lost in end-user waiting and DBA frustration.

橘香 2024-09-13 02:35:03

如果可以更改表的结构,则可以使用 uniqueidentifier 代替 PK 以及 newid() [SQL] 或 Guid.NewGuid( ) [C#] 在行生成代码中。

来自 Guid.NewGuid() doco:

新 Guid 的值全为零或等于任何其他 Guid 的可能性非常低。

If it's viable to change the structure of the table, then perhaps use a uniqueidentifier for the PK instead along with newid() [SQL] or Guid.NewGuid() [C#] in your row generation code.

From Guid.NewGuid() doco:

There is a very low probability that the value of the new Guid is all zeroes or equal to any other Guid.

宛菡 2024-09-13 02:35:01
  • 使用更通用的唯一标识符数据类型,例如 UNIQUEIDENTIFIER (UUID) 而不是 INTEGER。在这种情况下,您基本上可以在客户端创建它,将其传递给SQL,而不必担心它。当然,缺点是这个字段的大小。
  • 在数据库中创建一个简单的表CREATE TABLE ID_GEN (ID INTEGER IDENTITY),并将其用作工厂来为您提供标识符。理想情况下,您将创建一个存储过程(或函数),向其中传递所需的标识符数量。然后,存储过程会将这些行数(空)插入到此 ID_GEN 表中,并返回所有新的 ID,您可以在代码中使用它们。显然,您的原始表将不再具有 IDENTITY
  • 创建您自己的上述 ID_Factory 变体。

如果您没有其他限制,我会选择简单性(UUID)。

  • use more universal unique identifier data type like UNIQUEIDENTIFIER (UUID) instead of INTEGER. In this case you can basically create it on the client side, pass it to the SQL and do not have to worry about it. The disadvantage is that, of course, the size of this field.
  • create a simple table in the database CREATE TABLE ID_GEN (ID INTEGER IDENTITY), and use this as a factory to give you the identifiers. Ideally you would create a stored procedure (or function), to which you would pass the number of identifiers you need. The stored procedure will then insert this number of rows (empty) into this ID_GEN table and will return you all new ID's, which you can use in your code. Obviously, your original tables will not have the IDENTITY anymore.
  • create your own variation of the ID_Factory above.

I would choose simplicity (UUID) if you are not constrained otherwise.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文