使用 ToHashCode 在数据库中存储哈希值?

发布于 2024-09-30 01:05:12 字数 318 浏览 0 评论 0原文

我们目前广泛使用 GetHashCode 方法将哈希码存储在数据库中以跟踪唯一项目。 MSDN 有一个关于此的可怕条目

“GetHashCode 方法的默认实现不保证不同对象的唯一返回值。此外,.NET Framework 不保证 GetHashCode 方法的默认实现,并且它返回的值将是因此,该方法的默认实现不得用作哈希目的的唯一对象标识符。”

我们已经使用这种方法好几年了,没有出现任何问题。我们应该担心吗?如果担心的话,更好的方法是什么?

详细地说,数据来自外部来源。我们将获取两到三个字符串字段,将它们添加到一个新字符串中,然后使用其中的 GetHashCode。

We are currently extensively using the GetHashCode method to store hash codes in a database for tracking unique items. MSDN has a scary entry about this here

"The default implementation of the GetHashCode method does not guarantee unique return values for different objects. Furthermore, the .NET Framework does not guarantee the default implementation of the GetHashCode method, and the value it returns will be the same between different versions of the .NET Framework. Consequently, the default implementation of this method must not be used as a unique object identifier for hashing purposes."

We have been using this approach for several years without issue. Should we be worried, and if so what would be a better approach?

To elaborate, the data is coming from an external source. We are taking two to three string fields, adding them together into a new string, and then using the GetHashCode off of that.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

几度春秋 2024-10-07 01:05:12

使用哈希码作为唯一标识符是一个非常糟糕的主意,因为如果集合足够大,最终肯定会发生冲突——而且在统计上可能发生冲突之前,它不必非常大。哈希码是一种很好的、​​快速的方法来评估两个对象是否相同(假设相同的哈希函数)——如果它们哈希到不同的值,它们肯定是不同的。但是,如果它们散列为相同的值,那么您需要进行相等比较以确保它们是同一个对象。此时,您需要比较对象的独特属性,即,如果这些属性相同,则对象相同。

我建议在数据库中使用自然键属性的唯一索引,并结合人工的自动增量 id 作为主键。然后你可以确保你不会在数据库中获得重复插入(索引的唯一性约束),但是你可以通过简单地比较它们是否具有相同的id来快速比较数据库外部的对象——也保证是唯一的由主键约束。

Using a hash code as a unique identifier is a really bad idea because you're eventually guaranteed to have collisions if the collection is large enough -- and it doesn't have to be very large before you're statistically likely to have a collision. Hash codes are a good, quick way to evaluate if two objects are the same when (assuming the same hash function) - if they hash to different values, they are definitely different. If they hash to the same value, however, then you need to do an equality comparison to make sure that they are the same object. At that point you need to compare the properties of the object that make it unique, i.e., if these properties are the same, then the objects are the same.

I'd suggest using a unique index in the database on the natural key properties in conjunction with an artificial, autoincrement id as the primary key. Then you can be sure that you don't get duplicate insertions in the DB (uniqueness constraint of the index), but you can quickly compare the objects outside the DB by simply comparing whether they have the same id -- also guaranteed to be unique by the primary key constraint.

记忆之渊 2024-10-07 01:05:12

是的。害怕。 GetHashCode 无法保证任何大于 32 位的类型不会发生冲突。鉴于在某些情况下 GetHashCode 的实现可能不太完美(即某些类实现了自己的分布不均的版本),因此在某些情况下风险可能会更高。无论如何,这是一个糟糕的方法,需要重新考虑。

我建议阅读一些有关哈希表如何工作的内容,以便您更好地理解哈希码的用途。这实际上只是快速存储的启发式措施。

Yes. Be scared. GetHashCode cannot possibly offer a guarantee of no-collision on any type larger than 32bits. Given that in some cases the implementation of GetHashCode might be less than perfect (i.e. some classes implement their own ill-distributed version), the risk might be higher in some cases. Regardless, this is a bad approach and needs a rethink.

I'd suggest a bit of reading on how hash tables work so that you better understand the purpose of a hash code. It's really only a heuristic measure for speedy storage.

做个少女永远怀春 2024-10-07 01:05:12

GetHashCode 不可靠。

在这方面你有两种选择:

  1. 重写 GetHashCode 方法
    并让它返回一个 Guid 而不是
    一个整数。
  2. 让您的数据库创建
    为您提供唯一的 id 值。

GetHashCode is not reliable.

You have two choices in this regard:

  1. Override the GetHashCode method
    and have it return a Guid instead of
    an integer.
  2. Let your DB create
    unique id values for you.
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文