我使用哈希表(DotNET Dictionary 对象)作为稀疏二维数据集的一部分。哈希表中的大多数条目将靠近在一起。我最终可能会得到 100 ~ 10,000 个条目,所有条目都聚集在接近于零的位置。我读到,当散列分布在整个整数(32 位)范围内时,散列表的性能会更好。
有没有一种廉价的方法可以将连续的整数以 1:1 的方式映射到截然不同的值上?我不必将它们映射回来,这纯粹是单向的事情。
I'm using a hash table (DotNET Dictionary object) as part of a sparse two-dimensional data set. Most of the entries in the hash table will be close together. I'll probably end up with 100 ~ 10,000 entries, all of them clustered near zero. I've read that a Hash table performs better when the hashes are spread across the entire integer(32 bit) range.
Is there a cheap way to map consecutive integers onto wildly different values in a 1:1 fashion? I don't have to map them back, it's purely a one-way thing.
发布评论
评论(3)
也许我误解了你的意思,但字典已经对你的整数进行了哈希处理。不需要对它们进行预先哈希处理。为什么不尝试默认实现并看看它是如何进行的,而不是尝试进行预优化,这很可能是毫无意义的。
Maybe I'm misunderstanding what you are saying, but Dictionary will already hash your integers. There shouldn't be any need to pre-hash them. Why not try out the default implementation and see how it goes instead of attempting a pre-optimizion that will in all likelihood be pointless.
不要使用 Integer,而是编写一个继承 Integer 的类,并重写 GetHashCode 函数。这样你就不需要做任何事情,只需创建这个函数!
我能想到的均匀分布值的最简单方法是执行以下操作:
良好且均匀地分布,同时将工作量降至最低。
Instead of using Integer, write a class that Inherits from Integer, and override the GetHashCode function. This way you don't have to do anything but create this function!
The easiest way I can think of to spread out the values evenly is to do something like:
Nice and evenly split up, while keeping the effort to a minimum.
如果您知道键集的最大值 (kmax),则可以按常数因子(乘数)进行扩展,例如乘以固定素数,使乘积保持在最大整数大小 (2^31 - 1) 以下:
即最接近
(2^30) / kmax
的素数注意:确保使用的素数与哈希表中的桶数不同。
这是另一个解决方案:自从.NET Random 类将为相同的种子生成相同的值,您可以使用它来分发传入的密钥。
If you know the maximum value of your keyset (kmax), you could expand by a constant factor (multiplier), say multiply by a fixed prime number that keeps the product below the max integer size (2^31 - 1):
i.e. the nearest prime number to
(2^30) / kmax
Note: make sure the prime used is not the same as the number of buckets in the Hash table.
Here is another solution: Since the .NET Random class will generate the same value for the same seed, you could use that to distribute the incoming keys.