数据库和 JPA 中的 PostgreSQL 序列 ID 不同

发布于 2024-10-18 00:32:03 字数 2589 浏览 1 评论 0原文

我真的很困惑......但首先,让我给你一个粗略的概述。

我通过将 4 个表合并为两个来对数据库进行一些重组。所有表都以简单的数字序列作为主键。事实上,这些桌子成对地非常(非常)相似。它们被分成两部分的唯一原因是基于必须导入的历史数据。如果没有这种拆分,就会出现大量冗余,从概念上讲这是有道理的。

现在,经过大量数据清理工作后,现在终于可以合并它们并简单地使用其中一个字段作为鉴别器。说得更抽象一些,这些表包含公司。他们要么是当地居民,要么不是当地居民(两个阶层)。可以通过邮政编码(鉴别器字段)轻松区分它们。这些表的维度正在缓慢变化(序列是代理键)。另外两个表包含附加到这些 SCD 的普通数据。因此,有 4 张桌子。本地公司 2 个,非本地公司 2 个。

这些表现已简化并合并,因此我现在只有 CompanyCompanyData

为了安全起见,并且不丢失任何历史信息,我创建了两个带有新序列字段的新表。旧的序列被保留,以防 10 年后我意识到出了问题;)

到目前为止一切顺利。

重组相当容易,重新连接正确的条目也很容易。接下来,我需要更新与该数据库接口的应用程序,这需要更多工作,但仍然很容易。该应用程序使用 JPA,并在 PostgreSQL 9.0 数据库之上使用 EclipseLink 2.0(如前所述)。

奇怪的部分来了:

当我尝试插入一家新公司时,我收到重复键错误,指出给定的 ID 已经存在。但这应该由序列对象处理......不是吗?

所以我做了一些挖掘。我可以验证后续惰性确实返回了带有递增 id 的重复键错误。这意味着顺序逻辑是正确的。唯一的问题是当前值太低。因此,对 nextval(或 JPA 使用的任何内容)的调用将返回一个已经存在的 ID。

我在 JPA 实体中有以下内容:

@Id
@GeneratedValue(strategy = GenerationType.SEQUENCE, generator = "enterprise_id_seq")
@Column(name = "id", nullable = false)
private Integer id;

我的序列如下所示:

test_db=# \d enterprise_id_seq 
      Sequence "public.enterprise_id_seq"
    Column     |  Type   |        Value        
---------------+---------+---------------------
 sequence_name | name    | enterprise_id_seq
 last_value    | bigint  | 19659
 start_value   | bigint  | 1
 increment_by  | bigint  | 1
 max_value     | bigint  | 9223372036854775807
 min_value     | bigint  | 1
 cache_value   | bigint  | 1
 log_cnt       | bigint  | 32
 is_cycled     | boolean | f
 is_called     | boolean | t

我得到的错误如下:

[...]

Caused by: Exception [EclipseLink-4002] (Eclipse Persistence Services - 2.0.1.v20100213-
r6600): org.eclipse.persistence.exceptions.DatabaseException
Internal Exception: org.postgresql.util.PSQLException: ERROR: duplicate key value violates unique constraint "enterprise_pkey"
    Detail: Key (id)=(19611) already exists.
Error Code: 0
Call: INSERT INTO en...

[...]

如您所见,它尝试插入 id 为 19611 的实体,但最后一个值序列为19659。这显然是错误的。

我还尝试重新启动所有这一切背后的应用程序服务器,以便关闭所有打开的连接和会话。不走运...我注意到的另一件事:该字段被定义为Integer。它应该是Long吗?这需要对代码进行相当多的更改,而我还没有时间来解决这个问题。

由于我仅落后 50 个条目,我可以简单地尝试运行插入 50 次,但我宁愿知道到底出了什么问题......

我在这里错过了什么?

更新:经过更多挖掘,我发现allocationSize,默认值为 50。有趣的是,这与我看到的 ID 差异非常接近。由于一些测试和错误处理,它可能不是 100% 相同。可能有关系吗?老实说,我还没有理解这个设置背后的想法......

I am genuinely puzzled... but first, let me give you a rough overview.

I've done some restructuring in a database, by merging 4 tables into two. All tables have simple numeric sequences as primary keys. The tables were in fact in pairs very (very) similar. The only reason why they were split in two was based on historic data that had to be imported. Without this split, there would have been a lot of redundancy, and conceptually it made sense.

Now, after a good deal of work went into data cleansing, it's now finally possible to merge them and simply use one of the fields as discriminator. To talk less abstract, the tables contain companies. And they are either local residents or not (the two classes). They can easily be distinguished by their post code (the discriminator field). These tables are slowly changing dimensions (the sequence being the surrogate key). The other two tables contain the normal data attached to these SCDs. Hence, 4 tables. 2 for local companies, and 2 for non-locals.

These tables have now been simplified and merged, so I now only have Company and CompanyData.

To be on the safe side, and not to lose any historic information I created the two new tables with new sequence fields. The old sequences are kept in case 10 years from now I realise something went wrong ;)

So far so good.

The restructuring was fairly easy, reconnecting the correct entries was a no-brainer as well. Next, I needed to update the application which interfaces with this DB, which was a bit more work, but still easy. The application uses JPA using EclipseLink 2.0 on top of - as said already - a PostgreSQL 9.0 database.

And here comes the weird part:

When I try to insert a new company, I get a duplicate key error, stating that the given ID exists already. But that should be handled by the sequence object... should it not?

So I did some digging around. I could verify that subsequent inerts indeed returned duplicate key errors with incrementing ids. This means, that the sequence logic is OK. The only problem is that the current value is too low. So a call to nextval (or whatever JPA uses) will return an ID which already exists.

I have the following in the JPA-Entity:

@Id
@GeneratedValue(strategy = GenerationType.SEQUENCE, generator = "enterprise_id_seq")
@Column(name = "id", nullable = false)
private Integer id;

And my sequence looks like this:

test_db=# \d enterprise_id_seq 
      Sequence "public.enterprise_id_seq"
    Column     |  Type   |        Value        
---------------+---------+---------------------
 sequence_name | name    | enterprise_id_seq
 last_value    | bigint  | 19659
 start_value   | bigint  | 1
 increment_by  | bigint  | 1
 max_value     | bigint  | 9223372036854775807
 min_value     | bigint  | 1
 cache_value   | bigint  | 1
 log_cnt       | bigint  | 32
 is_cycled     | boolean | f
 is_called     | boolean | t

The errors I get are these:

[...]

Caused by: Exception [EclipseLink-4002] (Eclipse Persistence Services - 2.0.1.v20100213-
r6600): org.eclipse.persistence.exceptions.DatabaseException
Internal Exception: org.postgresql.util.PSQLException: ERROR: duplicate key value violates unique constraint "enterprise_pkey"
    Detail: Key (id)=(19611) already exists.
Error Code: 0
Call: INSERT INTO en...

[...]

As you can see, it tries to insert an entity with id 19611, but the last value on the sequence is 19659. Which is clearly wrong.

I also tried to restart the application server behind all this so to close all open connections ans sessions. No luck... Another thing I noticed: The field is defined as Integer. Should it rather be Long? That would necessitate quite some changes in the code, and I did not yet have time to fix this.

As I'm only 50 entries behind I could simply try to run the insert 50 times, but I'd rather know exactly what went wrong...

What am I missing here?

Update: After some more digging, I came across allocationSize which has a default of 50. Interestingly, this comes quite close to the difference of IDs I'm seeing. It may not be 100% the same due to some testing and frobnicating. Could it be related? I honestly have not understood the idea behind this setting...

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

ㄖ落Θ余辉 2024-10-25 00:32:03

当然,对于 Hibernate,如果使用 GenerationType.SEQUENCE,则默认使用 hi/lo 策略,最多在从数据库返回的值之前分配 allocationSize id。将allocationSize 设置为1,它应该做正确的事情。

之前对非常类似问题的回答: Hibernate为 PostgreSQL 插入生成两个不同的序列 ID

Certainly for Hibernate, the default if using GenerationType.SEQUENCE is to use a hi/lo strategy, up to allocationSize ids ahead of the value returned from the database. Set allocationSize to 1 and it should do the right thing.

A previous answer to a very similar problem: Hibernate generating two different sequence Ids for PostgreSQL insert

弥繁 2024-10-25 00:32:03

是的,这是因为您的 allocateSize 是 50(默认值)。我们 EclipseLink 的 next_value 假设增量为 50,因此前面的 50 个 id 也是如此。

AllocationSize 必须与您的序列增量匹配。我建议您将序列增量更新为 50,这将允许序列预分配,从而大大提高您的性能。

如果你想坚持使用 1,那么将注释中的 AllocationSize 更改为 1。

我建议 id 使用 long,但 int 最多 4,294,967,296 是安全的,所以取决于你是否认为你将有超过 40 亿行您的应用程序的生命周期。

Yes, this is because your allocationSize is 50, (the default). We EclipseLink does next_value is assumes the increment was 50, so has the previous 50 ids.

The allocationSize must match your sequence increment. I would recommend you update your sequence increment to be 50, this will allow sequence preallocation which will drastically improve your performance.

If you wish to stick with 1, then change your allocationSize in your annotation to 1.

I would recommend a long for an id, but an int will be safe up to 4,294,967,296, so depends if you think you will have more than 4billion rows in the life of your application.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文