此 INSERT 是否可能导致任何锁定/并发问题?
为了避免在这个特定数据库中出于某种原因出现自动序列号等问题,我想知道是否有人会看到这方面的任何问题:
INSERT INTO user (label, username, password, user_id)
SELECT 'Test', 'test', 'test', COALESCE(MAX(user_id)+1, 1) FROM user;
我正在使用 PostgreSQL(但也试图尽可能与数据库无关)。
编辑: 我想要这样做有两个原因。
- 保持对任何特定 RDBMS 的低依赖性。
- 如果数据批量更新到中央数据库,则不必担心更新序列。
插入性能不是问题,因为唯一需要插入性能的表是设置表。
编辑2: 我正在考虑的想法是,数据库中的每个表都有一个人工生成的 SiteCode 作为其密钥的一部分,因此我们始终有一个复合密钥。这有效地对 SiteCode 上的数据进行分区,并允许从特定站点获取数据并将其放在其他位置(显然位于相同的数据库结构上)。例如,这将允许将各种操作站点备份到一个中央数据库上,而且还允许该中央数据库让操作站点使用它。 我仍然可以使用序列,但它看起来很混乱。实际的 INSERT 看起来更像是这样:
INSERT INTO user (sitecode, label, username, password, user_id)
SELECT 'SITE001', 'Test', 'test', 'test', COALESCE(MAX(user_id)+1, 1)
FROM user
WHERE sitecode='SITE001';
如果这是有道理的.. 我之前做过类似的事情并且工作正常,但是在这种情况下中央数据库从未运行(它更多的是集中查看数据/分析的方式),因此它不需要生成id。
编辑3: 我开始认为只允许集中式数据库仅活动或仅备份会更简单,从而完全避免问题并允许更简单的设计。
哦,好吧,回到绘图板吧!
In an effort to avoid auto sequence numbers and the like for one reason or another in this particular database, I wondered if anyone could see any problems with this:
INSERT INTO user (label, username, password, user_id)
SELECT 'Test', 'test', 'test', COALESCE(MAX(user_id)+1, 1) FROM user;
I'm using PostgreSQL (but also trying to be as database agnostic as possible)..
EDIT:
There's two reasons for me wanting to do this.
- Keeping dependency on any particular RDBMS low.
- Not having to worry about updating sequences if the data is batch-updated to a central database.
Insert performance is not an issue as the only tables where this will be needed are set-up tables.
EDIT-2:
The idea I'm playing with is that each table in the database have a human-generated SiteCode as part of their key, so we always have a compound key. This effectively partitions the data on SiteCode and would allow taking the data from a particular site and putting it somewhere else (obviously on the same database structure). For instance, this would allow backing up of various operational sites onto one central database, but also allow that central database to have operational sites using it.
I could still use sequences, but it seems messy. The actual INSERT would look more like this:
INSERT INTO user (sitecode, label, username, password, user_id)
SELECT 'SITE001', 'Test', 'test', 'test', COALESCE(MAX(user_id)+1, 1)
FROM user
WHERE sitecode='SITE001';
If that makes sense..
I've done something similar before and it worked fine, however the central database in that case was never operational (it was more of a way of centrally viewing data / analyzing) so it did not need to generate ids.
EDIT-3:
I'm starting to think it'd be simpler to only ever allow the centralised database to be either active-only or backup-only, thus avoiding the problem completely and allowing a more simple design.
Oh well back to the drawing board!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
无论如何使用序列来生成唯一的数字。它们速度快,交易安全可靠。
“序列生成器”的任何自写实现要么对于多用户环境不可扩展(因为您需要进行大量锁定),要么根本不正确。
如果您确实需要独立于 DBMS,那么创建一个抽象层,该抽象层使用那些支持它们的 DBMS(Posgres、Oracle、Firebird、DB2、Ingres、Informix 等)的序列,并在那些不支持它们的 DBMS 上创建一个自编写的生成器。 t。
尝试创建一个不独立于 DBMS 的系统,仅仅意味着如果您不利用每个 DBMS 的优势,它在所有系统上的运行速度都会同样慢。
By all means use a sequence to generate unique numbers. They are fast, transaction safe and reliable.
Any self-written implemention of a "sequence generator" is either not scalable for a multi-user environment (because you need to do heavy locking) or simply not correct.
If you do need to be DBMS independent, then create an abstraction layer that uses sequences for those DBMS that support them (Posgres, Oracle, Firebird, DB2, Ingres, Informix, ...) and a self written generator on those that don't.
Trying to create a system than is DBMS independent, simply means it will run equally slow on all systems if you don't exploit the advantages of each DBMS.
你的目标是一个好的目标。避免使用 IDENTITY 和 AUTOINCRMENT 列意味着避免大量的管理问题。这只是一个示例。
然而,SO 的大多数响应者不会欣赏它,流行的(而不是技术)响应是“始终在所有移动的内容上粘贴
Id
AUTOINCREMENT 列”。下一个序列号就可以了,所有供应商都已经对其进行了优化。
只要此代码位于事务内部(理应如此),两个用户就不会获得相同的
MAX()+1
值。有一个称为隔离级别的概念,在对事务进行编码时需要理解它。远离
user_id
并使用更有意义的键,例如ShortName
或State
加上UserNo
甚至是更好(前者分散了争用,后者完全避免了下一个顺序争用,与大容量系统相关)。MVCC 的承诺和它实际提供的东西是两件不同的事情。只需上网或搜索 SO 即可查看有关 PostcreSQL/MVCC 的数百个问题。在计算机领域,物理定律适用,没有什么是免费的。 MVCC 存储所触及的所有行的私有副本,并在事务结束时解决冲突,从而导致更多的回滚。而 2PL 在交易开始时阻塞并等待,无需大量存储副本。
第一个示例代码块很好。
根据评论,此项不再适用:第二个示例代码块有问题。 “SITE001”不是复合键,它是复合列。不要这样做,将“SITE”和“001”分成两个离散的列。如果“SITE”是固定的重复值,则可以将其消除。
Your goal is a good one. Avoiding IDENTITY and AUTOINCREMENT columns means avoiding a whole plethora of administration problems. Here is just one example of the many.
However most responders at SO will not appreciate it, the popular (as opposed to technical) response is "always stick an
Id
AUTOINCREMENT column on everything that moves".A next-sequential number is fine, all vendors have optimised it.
As long as this code is inside a Transaction, as it should be, two users will not get the same
MAX()+1
value. There is a concept called Isolation Level which needs to be understood when coding Transactions.Getting away from
user_id
and onto a more meaningful key such asShortName
orState
plusUserNo
is even better (the former spreads the contention, latter avoids the next-sequential contention altogether, relevant for high volume systems).What MVCC promises, and what it actually delivers, are two different things. Just surf the net or search SO to view the hundreds of problems re PostcreSQL/MVCC. In the realm of computers, the laws of physics applies, nothing is free. MVCC stores private copies of all rows touched, and resolves collisions at the end of the Transaction, resulting in far more Rollbacks. Whereas 2PL blocks at the beginning of the Transaction, and waits, without the massive storage of copies.
The first example code block is fine.
As per Comments, this item no longer applies: The second example code block has an issue. "SITE001" is not a compound key, it is a compounded column. Do not do that, separate "SITE" and "001" into two discrete columns. And if "SITE" is a fixed, repeatingvalue, it can be eliminated.
不同的用户可以有相同的 user_id,并发的 SELECT 语句将看到相同的 MAX(user_id)。
如果您不想使用 SEQUENCE,则必须使用带有单个记录的额外表,并在每次需要新的唯一 id 时更新该单个记录:
Different users can have the same user_id, concurrent SELECT-statements will see the same MAX(user_id).
If you don't want to use a SEQUENCE, you have to use an extra table with a single record and update this single record every time you need a new unique id:
我同意 maksymko 的观点,但并不是因为我不喜欢序列或自动递增数字,因为它们有自己的位置。如果您需要一个在“各个操作站点”中唯一的值,即不仅在单个数据库实例的范围内,那么全局唯一标识符是一个强大、简单的解决方案。
I agree with maksymko, but not because I dislike sequences or autoincrementing numbers, as they have their place. If you need a value to be unique throughout your "various operational sites" i.e. not only within the confines of the single database instance, a globally unique identifier is a robust, simple solution.
有几点:
COPY
命令。它比运行大量INSERT
语句要快得多。user_id
列上有索引,则可能是这样。但真正的问题是,如果两个事务同时启动,它们将看到相同的 MAX(user_id) 值。这让我想到了下一点:SEQUENCE
。这些本质上是您可以从中提取下一个用户 ID 的地方。如果您确实担心生成下一个序列号的性能,则可以为每个线程生成一批序列号,然后仅在耗尽时请求新的批次(有时称为 HiLo 序列)。There are a couple of points:
COPY
command. It is much faster than running a large swab ofINSERT
statements.user_id
column. But the real problem is that if two transactions start at the same time, they will see the sameMAX(user_id)
value. It leads me to the next point:SEQUENCE
's. These essentially are a place where you can draw the next user id from. If you are really worried about performance on generating the next sequence number, you can generate a batch of them per thread and then only request a new batch when it is exhausted (sometimes called a HiLo sequence).是的,我可以看到一个巨大问题。 不要这样做。
多个连接可以同时获取完全相同的 ID。我本来打算添加“负载下”,但它甚至不需要 - 只需要两个查询之间的正确时机即可。
为了避免这种情况,您可以使用特定于每个数据库的事务或锁定机制或隔离级别,但是一旦我们到达该阶段,您不妨使用特定于 dbms 的序列/身份/自动编号等。
编辑
对于
question edit2
,没有理由担心 user_id 中存在间隙,因此您在所有站点上都有一个序列。如果间隙可以,某些选项将update tblsitesequenceno set @nextnum = nextnum = nextnum + 1
此语句的多个调用者都保证获得唯一的编号。
如果根本不能有间隙,请考虑使用在运行 max() 查询时限制访问的事务机制。或者使用与单个序列相同的技术使用动态 SQL 进行操作的增殖(具有标识列的序列/表/具有自动编号的表)。
Yes, I can see a huge problem. Don't do it.
Multiple connections can get the EXACT SAME id at the same time. I was going to add "under load" but it doesn't even need to be - just need the right timing between two queries.
To avoid it, you can use transactions or locking mechanisms or isolation levels specific to each DB, but once we get to that stage, you might as well use the dbms-specific sequence/identity/autonumber etc.
EDIT
For
question edit2
, there is no reason to fear gaps in the user_id, so you have one sequence across all sites. If gaps are ok, some options areupdate tblsitesequenceno set @nextnum = nextnum = nextnum + 1
Multiple callers to this statement are each guaranteed to get a unique number.
If you cannot have gaps at all, consider using a transaction mechanism that will restrict access while you are running the max() query. Either that or use a proliferation of (sequences/tables with identity columns/tables with autonumber) that you manipulate using dynamic SQL using the same technique for a single sequence.