Hibernate:使用组合键进行对象和实体身份映射的策略/模式?

发布于 2024-09-11 12:23:33 字数 1239 浏览 10 评论 0原文

为任何类型(原子类型)多列主键生成哈希码的一般无冲突 Java 最佳实践是什么?

我思考了几个小时并得出结论,由所有主键列连接的字符串将是唯一可靠的方法。然后对该连接字符串调用 Java 的 hashCode 方法应该会产生一个唯一的整数。 (事实上​​,它会以某种方式模仿数据库索引的作用,但这里不确定)

对于形式的多列主键,

CREATE TABLE PlayerStats
(
    game_id INTEGER,
    is_home BOOLEAN,
    player_id SMALLINT,
    roster_id SMALLINT,
    ... -- (game_id, is_home) FK to score, (player_id, roster_id) FK to team member
    PRIMARY KEY (game_id, is_home, player_id, roster_id)
)

可以计算哈希码,如下所示:

@Override
public int hashCode()
{
    //                                                                 maxchars:
    String surrogate =   String.format("%011d", this.gameId)         //11
                       + String.format("%01d" , this.isHome ? 1 : 0) //1
                       + String.format("%011d", this.playerId)       //6
                       + String.format("%011d", this.rosterId)       //6

    System.out.println("surrogate = '" + surrogate + "'");

    return surrogate.hashCode();
}

当然,这仅适用于 HashSets 和 Hashtable,当 equals 也为基于此。

我的问题:这是一个好的总体策略吗?

我可以看到即时计算可能不是最快的。每当组合键值发生更改时,您可能希望重新计算哈希码(例如,从对键属性进行操作的每个 setter 中调用 rehash() 方法。

欢迎提出建议和改进。是否有任何众所周知的策略?图案?

What is a general collision-free Java best practice to generate hash codes for any-type (atomic types) multi-column primary keys?

I thought about it for a few hours and came to the conclusion, that a string concatenated by all primary key columns would be the only reliable way to do so. Then calling Java's hashCode method on that concatenated string should yield a unique integer. (it would in fact somehow mimic what a database index does, not sure here though)

For a multi-column primary key of the form

CREATE TABLE PlayerStats
(
    game_id INTEGER,
    is_home BOOLEAN,
    player_id SMALLINT,
    roster_id SMALLINT,
    ... -- (game_id, is_home) FK to score, (player_id, roster_id) FK to team member
    PRIMARY KEY (game_id, is_home, player_id, roster_id)
)

a hash code could be calculated like:

@Override
public int hashCode()
{
    //                                                                 maxchars:
    String surrogate =   String.format("%011d", this.gameId)         //11
                       + String.format("%01d" , this.isHome ? 1 : 0) //1
                       + String.format("%011d", this.playerId)       //6
                       + String.format("%011d", this.rosterId)       //6

    System.out.println("surrogate = '" + surrogate + "'");

    return surrogate.hashCode();
}

Of course, this only works with HashSets and Hashtable when equals is also based on this.

My question: is this a good general strategy?

I can see on-the-fly calculation might not be the fastest. You might want to recalculate the hash code whenever a composite key value was changed (e.g. call a rehash() method from within every setter operating on a key property.

Suggestions and improvements welcome. Aren't there any generally known strategies for this? A pattern?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

活雷疯 2024-09-18 12:23:33

哈希码用作索引来查找数据集中具有相同代码的元素。然后使用 equals 方法在具有相同哈希码的元素集中查找匹配项。因此,生成的哈希码不必 100% 唯一。它只需要“足够唯一”即可在数据元素之间创建适当的分布,这样就不需要对具有相同 hashCode 值的大量项调用 equals 方法。

从这个角度来看,生成大量字符串并在这些字符串上计算哈希码似乎是避免由 3 个整数和 1 个布尔比较组成的 equals 操作的昂贵方法。它还不一定保证哈希码值的唯一性。

我的建议是从一种简单的方法开始,将密钥的哈希码作为其组成部分的哈希码之和。如果这不能提供良好的分布,因为所有 id 都在相似的范围内,您可以尝试在求和之前将 id 乘以一些不同的因子。

The hash code is used as an index to look up elements in the data set that have the same code. The equals method is then used to find matches within the set of elements that have the same hash code. As such, the generated hash code doesn't have to be 100% unique. It just needs to be "unique enough" to create a decent distribution among the data elements so that there isn't a need to invoke the equals method on a large number of items with the same hashCode value.

From that perspective, generating lots and lots of strings and computing hash codes on those strings seems like an expensive way to avoid an equals operation that consists of 3 integer and 1 boolean comparison. It also doesn't necessarily guarantee uniqueness in the hash code value.

My recommendation would be to start with a simple approach of having the hash code of the key being the sum of the hash codes of its constituents. If that doesn't provide a good distribution because all of the ids are in a similar range, you could try multiplying the ids by some different factors before summing.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文