避免数据库中出现重复的标识符
注意:了解所有答案,谢谢,但我们已经有了一个序列..并且不能使用 UNIQUE 约束,因为有些项目需要重复..我需要使用 PLSQL 以某种方式处理这个问题,所以基于一些标准(使用if 语句)我需要确保没有重复项..只是为了确认,这些标识符是使用不同类型的字符串进行定制的..对于每组字符串,我们有一个仅针对该字符串进行计数的数字(STR-STR-####) 我们有数百个这样的 STR-STR 组合,对于每个组合,我们都有一个 #### 进行计数...除此之外,还允许一些 STR-STR 组合有重复项..所以我们不能使用唯一约束,我们不能使用主键,因为它不是一个简单的数字,最重要的是我们确实为每个项目分配了一个主键..这些标识符是为用户准备的而不是用于数据库管理。
当用户创建一个项目时,根据一些具体条件,我们为该项目赋予一定的编号。在保存函数中,调用一个函数来准备第一个初始字符串,然后扫描保存编号的表并生成为该特定字符串分配下一个可用的 4 位数字。
现在这有一些问题,在获取下一个数字和提交到数据库之间大约有 1000 行代码。问题是,当 2 个人在几秒钟内创建具有相同标准的项目时,有时会发出相同的编号。
我所做的是,在提交之前,我检查数据库中的号码,如果存在,我调用该函数再次获取下一个可用号码...
即使此代码减少了重复的机会,如果我同时保存了 2 个项目,但仍然得到重复的数字。
有人知道有什么方法可以避免重复并将重复数字的机会降至 0 吗?
EDIT1:我们已经有一个主键..这个标识符是一个特殊的字符串,并且非常定制,因此不可能只使用从 0 开始计数的数字
EDIT2:在某些情况下我们需要重复。这是非常小的(也许大约 10 个不同的项目轨道使用重复项),所以在我使用 if 语句提交之前检查重复项之前,所以如果项目保存不属于应该具有的系统之一然后重复,我跳过检查...
编辑 3:我们在这里使用 PL/SQL
编辑 4:我想这个问题非常具体,我没有完全表达出来。虽然有很多答案,但没有一个确实抓住了我的问题..无论如何,我解决了问题并添加了以下内容作为我的答案..
NOTICE: Appericiate all the answers, thanks but we already have a sequence.. and can't use the UNIQUE contraints because some items need to have duplicates.. I need to handle this using PLSQL somehow, so based on some criteria (using an if statement) i need to ensure there is no duplicates for that.. And just to confirm, these identifiers are very customized with different types of string.. And for each set of string, we have a number taht counts up only for that string (STR-STR-####) and we have like hundreds of these STR-STR combinations and for each combination we have a #### that counts up... And on top of these some STR-STR combinations are allowed to have duplicates.. So we CAN'T use UNIQUE CONTRAINTS, not can we use PRIMARY KEY as it's not a simple number and on top of that we do have a PRIMARY KEY assigned to the each item.. These Identifiers are for the users and not for the database management.
When a user creates an item, based on some creteria, we give a certain number to the item.. In the save function, a function is called to prepare the first initial string, then the table where the numbers are kept is scanned and the next available 4 digit number for that particular string is assigned.
Now there is some issues with this, there is about 1000 lines of codes between getting the next number and the commit to the database. The issue is when 2 people create an item with the same criteria within seconds of each other, sometimes the same number is issued.
What i've done is that just before the commit, i check the database for the number and if it exist, i call the function to get the next available number again...
Even though this code decreased the chances of a duplictation, if i save 2 items exactly at the same time, i still get a duplicate number..
Anyone know any way to avoid duplicates and drop the chances of a duplicate number to a 0?
EDIT1: We already have a primary key.. This identifier is a special string, and very customized, therefore it woudn't be possible to just use a number that counts up from 0
EDIT2: There is also some cases where we need duplicates.. this is very minor (maybe about 10 different item tracks use duplicates) so before i do my checks for duplicates before the commit with an if statment, so if the item saves dosn't belong to one of the systems that is supposed to have duplicates then, i skip the check...
EDIT 3: We are using PL/SQL here
EDIT 4: I guess this problem was very specific and i didn't quite convey it so.. Although there were many answers, noone of them really captured my problem.. Regardless,i solved the problem and added below as my answer..
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(13)
查找命令 CREATE SEQUENCE。 Oracle 可以为您处理唯一编号的生成。 (实际上)每个数据库都有一种处理此问题的方法,尽管它们的实现方式有所不同。
[针对问题中编辑的新要求]
您仍然可以使用 SEQUENCE 来生成计数部分,然后将其与前缀组合并将其保存回数据库。而且,在极少数需要重复 ID 的情况下,不要从 SEQUENCE 中获取数字,只需使用已有的 ID。但 SEQUENCE 将解决您创建“井”的问题,您可以在需要时从中抽取保证唯一的数字
Look up the command CREATE SEQUENCE. Oracle can handle the unique number generation for you. (Virtually) Every database has a way of handling this issue although they differ somewhat in how it's implemented.
[In response to the new requirements edited into the question]
You can still use a SEQUENCE to produce the counting-up part, then combine that with the prefix and save it back to the database. And, in those rare cases in which you need a duplicate ID, don't get a number from the SEQUENCE, just use the one you already have. But SEQUENCE will solve your problem of creating a "well" from which you can draw a guaranteed unique number when needed
听起来您已将 3 条数据非规范化为 1 个字段。这是您的字符串字段当前包含的内容:
这是您真正应该拥有的内容(仅示例字段名称;给出这些有意义的名称会有所帮助,但我不这样做知道您的数据代表什么):
您现在可以在 ID 字段中使用序列。
当您想要恢复原始字符串时,可以连接 Str、Str2 和 ID 字段的内容(使用分隔连字符)。
简而言之:您的数据库设计已被破坏,您现在正在为此付出代价。我的建议是通过将您的身份字段标准化为 3 个独立的字段来修复设计。或者,您会发现您花费数小时重新创建数据库的内置功能,最终得到的解决方案存在缺陷并且存在严重的竞争条件问题。
或者,如果 PL/SQL 允许所有此功能:
创建一个包含以下字段的表:
然后:
您不会遇到并发问题,因为每次尝试获取 ID 都必须等待任何其他尝试完成。每个 STR-STR 对都有自己的计数器。
It sounds like you've denormalised 3 pieces of data into 1 field. This is what your string field currently contains:
This is what you really should have (example field names only; it would help to give these meaningful names, but I don't know what your data represents):
You can now use a sequence in the ID field.
When you want to get your original string back, you concatenate the contents of the Str, Str2 and ID fields (with dividing hyphens).
In short: Your database design is broken, and you're now paying the price for it. My advice would be to fix the design by normalising your identity field into 3 separate fields. Alternatively, you'll find that you spend hours recreating built-in functionality of the database and end up with a solution that is buggy and has horrible problems with race conditions.
Alternatively, if PL/SQL allows all of this functionality:
Create a table that contains the following fields:
Then:
You'll have no concurrency problems, as each attempt to get an ID will have to wait for any other attempts to finish. Each STR-STR pair will have its own counter.
我几乎不想提出这个建议,但由于您选择的方法非常糟糕,所以相比之下我要在这里展示的方法相当不错。请原谅讽刺,但是当您有这么多用户告诉您您的方法总体上是错误的时,您应该考虑到这一点。
在某个地方,您可能正在运行这样的 SQL:
然后您可能会使用
SUBSTR
拆分变量,将第二部分解析为NUMBER
,递增它,并构建一个新变量与新代码。您可以做的是将
FOR UPDATE
子句添加到SELECT
语句中,对相关记录加锁。当然,您实际上并没有更新它们,但根据您的定义,您必须序列化操作。这是一种低性能、不可扩展且肮脏的方式来获得你想要的东西,但它应该有效。检查 Oracle 文档以了解所有含义。I almost hate to suggest this, but since the approach that you settled on was pretty lousy then the one I'm going to show here is pretty good in comparison. Forgive the sarcasm, but when you have this many users telling you that your approach in general is wrong you ought to take that into consideration.
Somewhere you are probably running a SQL like this:
Then later you're probably split the variable using
SUBSTR
, parse the second part into aNUMBER
, increment it, and build a new variable with the new code.What you can do is add the
FOR UPDATE
clause to theSELECT
statement, placing a lock on the records in question. Sure, you're not actually updating them, but by your definition you have to serialize the operation. This is a non-performant, unscalable, and dirty way to get what you want, but it should work. Check the Oracle docs to see all the implications.在该列上使用唯一索引,或主键!
Use a UNIQUE INDEX on that column, or a PRIMARY KEY!
这令人担忧。您确实应该为此使用存储过程并将其全部包装在事务中。您无法保证任何两条记录都会具有不同的编号并会导致问题。但我同意 - 您需要将列设置为唯一标识符并使用主键 - 我认为它是一个关系数据库!
This is worrying. You should really be using a stored procedure for this and wrap it all in a transaction. You got nothing guaranteeing that any two records will have a different number and will cause problems. But I agree - you need to set a column as a unique identifer and use primary keys - it is a relational database I assume!
您可能需要某种序列化过程来完成此过程。我建议的一种方法是在初始插入期间将此字段留空,等待过程提交,然后让另一个进程(例如重复作业)填充此列。
此其他流程将根据您的业务规则填充所有行。该作业将是唯一能够接触此列的作业,因此您将避免任何并发问题。
您可以将作业设置为每 X 秒运行一次,这意味着在此设置中该列将为空的短暂延迟。或者,您可以让初始会话在提交后通过某种序列化启动更新过程(对行进行抢占式锁定,以便没有两个更新过程可以同时运行)。
You might need some sort of serialization process to go through this. One path I would recommend is leaving this field blank during the initial insert, wait for the procedure to commit and then have another process (a recurring job for example) fill this column.
This other process would fill all rows according to your business rules. This job would be the only one able to touch this column and you would therefore avoid any concurrency problem.
You could set the job to run every X seconds, this would imply a small delay during which the column would be empty in this setup. Or you could let the initial session launch the update process after the commit with some sort of serialization (a preemptive lock on a row so that no two update process can run at the same time).
这可以使用基于函数的唯一索引来完成。首先,您需要向表中添加一列,以指定每行的标识符是否需要唯一:
现在创建一个唯一的 FBI,仅保存需要唯一的标识符:
最后,在标识符生成逻辑中,每当您需要标识符是唯一的,设置identifier_must_be_unique='Y';否则将其保留为空)。随后,联邦调查局将实施条件限制。
This could be done using a function-based unique index. First you'd need to add a column to the table to specify whether the identifier needs to be unique for each row:
Now create a unique FBI that only holds the identifiers that need to be unique:
Finally, in your identifier generation logic, whenever you need the identifier to be unique, set identifier_must_be_unique='Y'; otherwise leave it as null). The FBI will then implement the conditional constraint.
创建自动递增的 PRIMARY KEY 列或 UNIQUE KEY 列(使用
AUTOINCRMENT
关键字或通过 SEQUENCE(如 Oracle 所做的那样))。这样,如果 2 个人同时添加 2 个精确数据行,数据库将添加 2 个精确值,但每个值都有自己的唯一 ID。Create a PRIMARY KEY column or a UNIQUE KEY column that is auto incremented (either with an
AUTOINCREMENT
keyword or via SEQUENCE (as Oracle does)). That way, if 2 people add 2 exact data row at the exact same time, the database will add 2 exact values but each with their own unique ID.你的代码绝对应该用序列替换。 Oracle 很好地管理了这种并发性。
此外,该序列应保存在约束为 UNIQUE 的列中。
Yuor code should definitely be replaced with a SEQUENCE. Oracle manages this concurrency very well.
Also, this sequence should be saved in a column that is constrained as UNIQUE.
您可以添加一个表来保存所有已使用且需要唯一的标识符,如下所示:
然后,每当在 350 行逻辑中生成需要唯一的“STR-STR=####”值时,将其插入该表中。如果插入失败,请获取另一个号码并重试。
You could add a table to hold all the identifiers that have been used and need to be unique like this:
Then whenever in your 350-line logic you generate a 'STR-STR=####' value that needs to be unique, insert it into that table. If the insert fails, get another number and try again.
您的数据模型已损坏。但是,我假设修复它不是一个可行的选择。如果最大的问题是当您对包含序列值的表运行两个快速选择时出现重复键(您应该在此处使用序列...但您知道这一点)...那么您可以尝试使用“select .. .for update”,这将锁定访问它的会话的行。请注意,此逻辑可能会给应用程序带来延迟。
You have a broken data model. But, i'm assuming fixing it isn't a viable option. If you biggest problem is the duplicate keys when you run two quick selects against the table that contains the sequence values (you should use sequences here... but you know that)... then you can try to use the "select ... for update" which will lock the row for the session accessing it. Be careful, this logic may introduce latency into the application.
我可能会遗漏一些明显的东西,但我会建议一些类似于其他人之前建议的东西。在列上创建唯一约束,这样当您尝试保留记录时,如果其他进程已使用该自定义生成的 ID,则记录将会失败。然后,您可以在 PL/SQL 中捕获该异常并实现某种重试逻辑,直到您能够成功地将唯一生成的 ID 插入到数据库中。
I may be missing something obvious, but I'd suggest something similar to what someone else suggested earlier. Create a unique constraint on the column such that when you try to persist the record, it will fail if some other process has already used that custom generated ID. You then can trap that exception in the PL/SQL and implement some sort of retry logic until you are able to successfully insert a unique generated ID into the database.
好吧,没有一个答案真正解决了问题,主要是因为我没有正确传达整个情况。
但基本上我在提交之前将检查放入循环中,并在同一个循环中更新、提交和再次检查。如果仍然存在,循环会再次运行,在这种情况下会进行检查。这有点难以解释,但重复的可能性极低(循环的限制为 100)
Well noone of the answers really solved the problem, mostly because i didn't convey the whole situtation properly..
But basically i put my check just before the commit into a loop and updated and committed and checked again within the same loop.. and if still existed, the loop ran again, in which case the check happened.. It's kinda hard to explain but the chances of a duplicate is extremly low (the loop has a limit of 100)