不是“int GetHashCode”吗?有点短视?

发布于 2024-08-17 16:04:43 字数 572 浏览 4 评论 0原文

鉴于 .Net 能够通过 IntPtr 检测位数(尽管通过反射器查看,其中很大一部分被标记为不安全 - 耻辱),我一直认为 GetHashCode 返回 int 可能是短视的。

我知道最终使用良好的散列算法,Int32 提供的数十亿种排列绝对足够,但即便如此,可能的散列集越窄,散列键查找就越慢,因为需要更多的线性搜索。

同样 - 我是唯一一个觉得这很有趣的人:

struct Int64{
  public override int GetHashCode()
  {
    return (((int) this) ^ ((int) (this >> 0x20)));
  }
}

而 Int32 只是返回 this

如果 IntPtr 由于性能问题而无法解决,那么实现 IEquatable 等的 IHashCode 可能更好?

随着我们的平台在内存容量、磁盘大小等方面变得越来越大,32 位哈希就足够的日子肯定已经屈指可数了吗?

或者只是这样一种情况,即通过接口抽象出哈希值或根据平台调整哈希值所涉及的开销超过了任何潜在的性能优势?

Given that .Net has the ability to detect bitness via IntPtr (looking through reflector a good amount of it is marked unsafe, though - shame) I've been thinking that GetHashCode returning an int is potentially short-sighted.

I know that ultimately with a good hashing algorithm the billions of permutations offered by Int32 are absolutely adequate, but even so, the narrower the possible set of hashes the slower hashed key lookups are as more linear searching will be required.

Equally - am I the only one who finds this amusing:

struct Int64{
  public override int GetHashCode()
  {
    return (((int) this) ^ ((int) (this >> 0x20)));
  }
}

Whilst Int32 simply returns this.

If IntPtr is out of question because of performance concerns, perhaps an IHashCode that implements IEquatable etc is better?

As our platforms get larger and larger in terms of memory capacity, disk size etc, surely the days of 32 bit hashes being enough are potentially numbered?

Or is it simply the case that the overhead involved in either abstracting out the hash via interfaces, or adapting the size of the hash according to the platform outweighs any potential performance benefits?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

愛上了 2024-08-24 16:04:43

Int64 哈希函数用于确保考虑所有位 - 因此基本上它是将顶部 32 位与底部 32 位进行异或。我真的无法想象更好的通用用途。 (截断为 Int32 是没有用的 - 那么如何正确地散列低 32 位全为零的 64 位值?)

如果使用 IntPtr 作为散列返回值,则代码必须具有条件分支(是它是 32 位吗?是 64 位吗?等等),这会减慢哈希函数的速度,从而破坏整个点。

我想说,如果您有一个实际上有 20 亿个存储桶的哈希表,那么您可能正处于编写整个自定义系统的阶段。 (数据库可能是更好的选择?)在这种规模下,确保桶均匀填充将是一个更紧迫的问题。 (换句话说,更好的哈希函数可能会比更多数量的存储桶带来更多的红利)。

如果您确实想要在内存中使用多 GB 的映射,那么没有什么可以阻止您实现具有等效 64 位哈希函数的基类。但是,您必须编写自己的字典等效项。

The Int64 hash function is there to make sure that all the bits are considered - so basically it is XORing the top 32 bits with the bottom 32 bits. I can't really imagine a better general-purpose one. (Truncating to Int32 would be no good - how could you then properly hash 64-bit values which had all zeros in the lower 32 bits?)

If IntPtr were used as the hash return value, then code would have to have conditional branches (is it 32-bit? is it 64-bit? etc), which would slow down the hash functions, defeating the whole point.

I would say that if you have a hashtable which actually has 2 billion buckets, you're probably at the stage of writing an entire custom system anyway. (Possibly a database would be a better choice?) At that size, making sure the buckets were filled evenly would be a more pressing concern. (In other words, a better hash function would probably pay more dividends than a larger number of buckets).

There would be nothing to stop you implementing a base class which did have an equivalent 64-bit hash function, if you did want a multi-gigabyte map in memory. You'd have to write your own Dictionary equivalent however.

叫思念不要吵 2024-08-24 16:04:43

确实意识到GetHashCode返回的哈希码用于在哈希表中寻址?使用更大的数据类型将是徒劳的,因为无论如何所有哈希表都较小。额外的信息只会被浪费,因为它不能被充分利用。

常见的哈希表具有几千到几百万个条目。 32 位整数足以覆盖这个索引范围。

You do realize that the hash code returned by GetHashCode is used for addressing in a hash table? Using a bigger data type would be a futile exercise since all hash tables are smaller anyway. Additional information would simply be wasted because it cannot be used adequately.

Common hash tables have in the order of a few thousand to a few million entries. A 32 bit integer is more than sufficient to cover this range of indices.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文