此 INSERT 是否可能导致任何锁定/并发问题?

发布于 2024-10-16 00:41:57 字数 1030 浏览 2 评论 0原文

为了避免在这个特定数据库中出于某种原因出现自动序列号等问题,我想知道是否有人会看到这方面的任何问题:

INSERT INTO user (label, username, password, user_id)
SELECT 'Test', 'test', 'test', COALESCE(MAX(user_id)+1, 1) FROM user;

我正在使用 PostgreSQL(但也试图尽可能与数据库无关)。

编辑: 我想要这样做有两个原因。

  • 保持对任何特定 RDBMS 的低依赖性。
  • 如果数据批量更新到中央数据库,则不必担心更新序列。

插入性能不是问题,因为唯一需要插入性能的表是设置表。

编辑2: 我正在考虑的想法是,数据库中的每个表都有一个人工生成的 SiteCode 作为其密钥的一部分,因此我们始终有一个复合密钥。这有效地对 SiteCode 上的数据进行分区,并允许从特定站点获取数据并将其放在其他位置(显然位于相同的数据库结构上)。例如,这将允许将各种操作站点备份到一个中央数据库上,而且还允许该中央数据库让操作站点使用它。 我仍然可以使用序列,但它看起来很混乱。实际的 INSERT 看起来更像是这样:

INSERT INTO user (sitecode, label, username, password, user_id)
SELECT 'SITE001', 'Test', 'test', 'test', COALESCE(MAX(user_id)+1, 1)
FROM user
WHERE sitecode='SITE001';

如果这是有道理的.. 我之前做过类似的事情并且工作正常,但是在这种情况下中央数据库从未运行(它更多的是集中查看数据/分析的方式),因此它不需要生成id。

编辑3: 我开始认为只允许集中式数据库仅活动或仅备份会更简单,从而完全避免问题并允许更简单的设计。

哦,好吧,回到绘图板吧!

In an effort to avoid auto sequence numbers and the like for one reason or another in this particular database, I wondered if anyone could see any problems with this:

INSERT INTO user (label, username, password, user_id)
SELECT 'Test', 'test', 'test', COALESCE(MAX(user_id)+1, 1) FROM user;

I'm using PostgreSQL (but also trying to be as database agnostic as possible)..

EDIT:
There's two reasons for me wanting to do this.

  • Keeping dependency on any particular RDBMS low.
  • Not having to worry about updating sequences if the data is batch-updated to a central database.

Insert performance is not an issue as the only tables where this will be needed are set-up tables.

EDIT-2:
The idea I'm playing with is that each table in the database have a human-generated SiteCode as part of their key, so we always have a compound key. This effectively partitions the data on SiteCode and would allow taking the data from a particular site and putting it somewhere else (obviously on the same database structure). For instance, this would allow backing up of various operational sites onto one central database, but also allow that central database to have operational sites using it.
I could still use sequences, but it seems messy. The actual INSERT would look more like this:

INSERT INTO user (sitecode, label, username, password, user_id)
SELECT 'SITE001', 'Test', 'test', 'test', COALESCE(MAX(user_id)+1, 1)
FROM user
WHERE sitecode='SITE001';

If that makes sense..
I've done something similar before and it worked fine, however the central database in that case was never operational (it was more of a way of centrally viewing data / analyzing) so it did not need to generate ids.

EDIT-3:
I'm starting to think it'd be simpler to only ever allow the centralised database to be either active-only or backup-only, thus avoiding the problem completely and allowing a more simple design.

Oh well back to the drawing board!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

我不咬妳我踢妳 2024-10-23 00:41:58

无论如何使用序列来生成唯一的数字。它们速度快,交易安全可靠。

“序列生成器”的任何自写实现要么对于多用户环境不可扩展(因为您需要进行大量锁定),要么根本不正确。

如果您确实需要独立于 DBMS,那么创建一个抽象层,该抽象层使用那些支持它们的 DBMS(Posgres、Oracle、Firebird、DB2、Ingres、Informix 等)的序列,并在那些不支持它们的 DBMS 上创建一个自编写的生成器。 t。

尝试创建一个不独立于 DBMS 的系统,仅仅意味着如果您不利用每个 DBMS 的优势,它在所有系统上的运行速度都会同样慢。

By all means use a sequence to generate unique numbers. They are fast, transaction safe and reliable.

Any self-written implemention of a "sequence generator" is either not scalable for a multi-user environment (because you need to do heavy locking) or simply not correct.

If you do need to be DBMS independent, then create an abstraction layer that uses sequences for those DBMS that support them (Posgres, Oracle, Firebird, DB2, Ingres, Informix, ...) and a self written generator on those that don't.

Trying to create a system than is DBMS independent, simply means it will run equally slow on all systems if you don't exploit the advantages of each DBMS.

年华零落成诗 2024-10-23 00:41:58

你的目标是一个好的目标。避免使用 IDENTITY 和 AUTOINCRMENT 列意味着避免大量的管理问题。这只是一个示例

  • 然而,SO 的大多数响应者不会欣赏它,流行的(而不是技术)响应是“始终在所有移动的内容上粘贴 Id AUTOINCREMENT 列”。

  • 下一个序列号就可以了,所有供应商都已经对其进行了优化。

  • 只要此代码位于事务内部(理应如此),两个用户就不会获得相同的 MAX()+1 值。有一个称为隔离级别的概念,在对事务进行编码时需要理解它。

  • 远离 user_id 并使用更有意义的键,例如 ShortNameState 加上 UserNo 甚至是更好(前者分散了争用,后者完全避免了下一个顺序争用,与大容量系统相关)。

  • MVCC 的承诺和它实际提供的东西是两件不同的事情。只需上网或搜索 SO 即可查看有关 PostcreSQL/MVCC 的数百个问题。在计算机领域,物理定律适用,没有什么是免费的。 MVCC 存储所触及的所有行的私有副本,并在事务结束时解决冲突,从而导致更多的回滚。而 2PL 在交易开始时阻塞并等待,无需大量存储副本。

    • 大多数具有 MVCC 实际经验的人不建议将其用于高争用、高容量的系统。

第一个示例代码块很好。

根据评论,此项不再适用:第二个示例代码块有问题。 “SITE001”不是复合键,它是复合列。不要这样做,将“SITE”和“001”分成两个离散的列。如果“SITE”是固定的重复值,则可以将其消除。

Your goal is a good one. Avoiding IDENTITY and AUTOINCREMENT columns means avoiding a whole plethora of administration problems. Here is just one example of the many.

  • However most responders at SO will not appreciate it, the popular (as opposed to technical) response is "always stick an Id AUTOINCREMENT column on everything that moves".

  • A next-sequential number is fine, all vendors have optimised it.

  • As long as this code is inside a Transaction, as it should be, two users will not get the same MAX()+1 value. There is a concept called Isolation Level which needs to be understood when coding Transactions.

  • Getting away from user_id and onto a more meaningful key such as ShortName or State plus UserNo is even better (the former spreads the contention, latter avoids the next-sequential contention altogether, relevant for high volume systems).

  • What MVCC promises, and what it actually delivers, are two different things. Just surf the net or search SO to view the hundreds of problems re PostcreSQL/MVCC. In the realm of computers, the laws of physics applies, nothing is free. MVCC stores private copies of all rows touched, and resolves collisions at the end of the Transaction, resulting in far more Rollbacks. Whereas 2PL blocks at the beginning of the Transaction, and waits, without the massive storage of copies.

    • most people with actual experience of MVCC do not recommend it for high contention, high volume systems.

The first example code block is fine.

As per Comments, this item no longer applies: The second example code block has an issue. "SITE001" is not a compound key, it is a compounded column. Do not do that, separate "SITE" and "001" into two discrete columns. And if "SITE" is a fixed, repeatingvalue, it can be eliminated.

红ご颜醉 2024-10-23 00:41:58

不同的用户可以有相同的 user_id,并发的 SELECT 语句将看到相同的 MAX(user_id)。

如果您不想使用 SEQUENCE,则必须使用带有单个记录的额外表,并在每次需要新的唯一 id 时更新该单个记录:

CREATE TABLE my_sequence(id INT);

BEGIN;
UPDATE my_sequence SET id = COALESCE(id, 0) + 1;
INSERT INTO 
  user (label, username, password, user_id)
SELECT 'Test', 'test', 'test', id FROM my_sequence;
COMMIT;

Different users can have the same user_id, concurrent SELECT-statements will see the same MAX(user_id).

If you don't want to use a SEQUENCE, you have to use an extra table with a single record and update this single record every time you need a new unique id:

CREATE TABLE my_sequence(id INT);

BEGIN;
UPDATE my_sequence SET id = COALESCE(id, 0) + 1;
INSERT INTO 
  user (label, username, password, user_id)
SELECT 'Test', 'test', 'test', id FROM my_sequence;
COMMIT;
┊风居住的梦幻卍 2024-10-23 00:41:58

我同意 maksymko 的观点,但并不是因为我不喜欢序列或自动递增数字,因为它们有自己的位置。如果您需要一个在“各个操作站点”中唯一的值,即不仅在单个数据库实例的范围内,那么全局唯一标识符是一个强大、简单的解决方案。

I agree with maksymko, but not because I dislike sequences or autoincrementing numbers, as they have their place. If you need a value to be unique throughout your "various operational sites" i.e. not only within the confines of the single database instance, a globally unique identifier is a robust, simple solution.

苄①跕圉湢 2024-10-23 00:41:57

有几点:

  • Postgres 使用多版本并发控制 (MVCC),因此读者永远不会等待作者,反之亦然。但每次写入时当然都会发生序列化。如果您要将大量数据加载到系统中,请查看COPY命令。它比运行大量 INSERT 语句要快得多。
  • MAX(user_id) 可以用索引来回答,如果 user_id 列上有索引,则可能是这样。但真正的问题是,如果两个事务同时启动,它们将看到相同的 MAX(user_id) 值。这让我想到了下一点:
  • 处理像 user_id 这样的数字的规范方法是使用 SEQUENCE。这些本质上是您可以从中提取下一个用户 ID 的地方。如果您确实担心生成下一个序列号的性能,则可以为每个线程生成一批序列号,然后仅在耗尽时请求新的批次(有时称为 HiLo 序列)。
  • 您可能希望随着数字的增加而将 user_id 打包得很好,但我认为您应该尝试摆脱它。原因是删除 user_id 无论如何都会产生一个漏洞。因此,如果序列没有严格增加,我不会太担心。

There are a couple of points:

  • Postgres uses Multi-Version Concurrency Control (MVCC) so Readers are never waiting on writers and vice versa. But there is of course a serialization that happens upon each write. If you are going to load a bulk of data into the system, then look at the COPY command. It is much faster than running a large swab of INSERT statements.
  • The MAX(user_id) can be answered with an index, and probably is, if there is an index on the user_id column. But the real problem is that if two transactions start at the same time, they will see the same MAX(user_id) value. It leads me to the next point:
  • The canonical way of handling numbers like user_id's is by using SEQUENCE's. These essentially are a place where you can draw the next user id from. If you are really worried about performance on generating the next sequence number, you can generate a batch of them per thread and then only request a new batch when it is exhausted (sometimes called a HiLo sequence).
  • You may be wanting to have user_id's packed up nice and tight as increasing numbers, but I think you should try to get rid of that. The reason is that deleting a user_id will create a hole anyway. So i'd not worry too much if the sequences were not strictly increasing.
墨小沫ゞ 2024-10-23 00:41:57

是的,我可以看到一个巨大问题。 不要这样做。

多个连接可以同时获取完全相同的 ID。我本来打算添加“负载下”,但它甚至不需要 - 只需要两个查询之间的正确时机即可。

为了避免这种情况,您可以使用特定于每个数据库的事务或锁定机制或隔离级别,但是一旦我们到达该阶段,您不妨使用特定于 dbms 的序列/身份/自动编号等。

编辑

对于question edit2,没有理由担心 user_id 中存在间隙,因此您在所有站点上都有一个序列。如果间隙可以,某些选项将

  • 使用有保证的更新语句,例如(在 SQL Server 中)

update tblsitesequenceno set @nextnum = nextnum = nextnum + 1

此语句的多个调用者都保证获得唯一的编号。

  • 使用生成标识/序列/自动编号(特定于数据库)的单个表

如果根本不能有间隙,请考虑使用在运行 max() 查询时限制访问的事务机制。或者使用与单个序列相同的技术使用动态 SQL 进行操作的增殖(具有标识列的序列/表/具有自动编号的表)。

Yes, I can see a huge problem. Don't do it.

Multiple connections can get the EXACT SAME id at the same time. I was going to add "under load" but it doesn't even need to be - just need the right timing between two queries.

To avoid it, you can use transactions or locking mechanisms or isolation levels specific to each DB, but once we get to that stage, you might as well use the dbms-specific sequence/identity/autonumber etc.

EDIT

For question edit2, there is no reason to fear gaps in the user_id, so you have one sequence across all sites. If gaps are ok, some options are

  • use guaranteed update statements, such as (in SQL Server)

update tblsitesequenceno set @nextnum = nextnum = nextnum + 1

Multiple callers to this statement are each guaranteed to get a unique number.

  • use a single table that produces the identity/sequence/autonumber (db specific)

If you cannot have gaps at all, consider using a transaction mechanism that will restrict access while you are running the max() query. Either that or use a proliferation of (sequences/tables with identity columns/tables with autonumber) that you manipulate using dynamic SQL using the same technique for a single sequence.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文