将部分 MD5 哈希码转换为长哈希码

发布于 2024-11-17 08:29:21 字数 567 浏览 2 评论 0 原文

我正在使用 MD5 算法对磁盘哈希表的密钥进行哈希处理(我知道这是否是用于此目的的最佳算法是值得怀疑的,但我现在就使用它。这个问题可以推广到任何生成字节数组的算法)。我的问题是这样的:

哈希码的大小决定了哈希表中组合(桶)的数量。由于 MD5 是 128 位,因此存在大量组合(~ 3.4e38),这对于我的目的来说太大了。所以我想要做的是取出 MD5 生成的字节数组的前 n 位,并将它们转换为 long (或 ulong)值。由于MD5产生的是字节数组,如果我想要整数个字节,这很容易做到,但这会导致组合数量跳跃太大。我发现单位版本要棘手得多。

目标:

n = 10  // I.e. I want 2^10 combinations
long pos = someFcn(byte[] key, n)

其中 key 是被散列的值,n 是我想要使用的 MD5 结果的位数。那么,Pos 将是 0 到 1023 之间的整数(在 n = 10 的情况下)。如果 n = 11,则代码将从 0 到 2^11-1 = 2027 等。必须有点快/高效。

看起来并不难,但它却让我困惑。任何帮助将不胜感激。谢谢。

I'm using the MD5 algorithm to hash the key for an on-disk hash table (I know it's questionable whether this is the best algorithm to use for this, but I'm going with it for now. The problem is generalizable to any algorithm that produces a byte array). My problem is this:

The size of the hash code determines the number of combinations (buckets) in the hash table. Since MD5 is 128 bit, there are a huge number of combinations (~ 3.4e38) which is way too big for my purpose. So what I want to do is pick off the first n bits of the byte array that MD5 produces, and convert those into a long (or ulong) value. Since MD5 produces a byte array, it would be easy to do if I wanted an integral number of bytes, but this leads to too big a jump in the number of combinations. I'm finding the single bit version to be a lot trickier.

Goal:

n = 10  // I.e. I want 2^10 combinations
long pos = someFcn(byte[] key, n)

where key is the value being hashed, and n is the number of bits of the MD5 result I want to use. Pos, then, will be an integer from 0 to 1023 (in the case of n = 10). If n = 11, the code will be from 0 to 2^11-1 = 2027, etc. Has to be somewhat fast/efficient.

Doesn't seem that hard but it's eluding me. Any help would be much appreciated. Thanks.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

恬淡成诗 2024-11-24 08:29:21

首先,将前四个字节转换为整数,使用 BitConverter.ToInt32。无论如何,它都会获得 4 个字节,但这可能不会使其明显变慢,因为无论如何,您都在使用 32 位寄存器进行其余计算,以及诸如“如果它 < 16 则执行此操作”之类的复杂内容与前两个字节”只会使它变得更加复杂

然后,给定该整数,取最低的 N 位。如果您确实想要在编译时未知的特定位数[桶数的两个幂],~((-1)< 是一个获得 2 的好技巧^N-1。

或者您可以简单地使用 ToUInt32 来代替并对质数取模[转换为 UInt64 可能会稍微好一些,然后您就可以从一半的位开始,在这种情况下]

First, convert the first four bytes into an integer, with BitConverter.ToInt32. It's getting four bytes no matter what, but this probably won't make it measurably slower, since you're working with 32-bit registers for the rest of the calculations anyway, and complex stuff like "if it's < 16 then do this with the first two bytes" will just make it more complicated

Then, given that integer, take the lowest N bits. If you really want a specific number of bits [a power of two number of buckets] not known at compile time, ~((-1)<<N) is a nice trick to get 2^N-1.

Or you could simply use ToUInt32 instead and modulo a prime number [it might be slightly better to convert to UInt64 instead, then you've got fully half the bits to start with, in this case]

亚希 2024-11-24 08:29:21

获取前10位,例如:

int result = ((int)key[0] << 2) | (((int)key[1] >> 6) & 0x03)

To obtain the first 10 bits, for example:

int result = ((int)key[0] << 2) | (((int)key[1] >> 6) & 0x03)
忆离笙 2024-11-24 08:29:21

如果你有一个像这样的数组,

unsigned char data[2000];

那么你可以将前 n 位刮掉成一个整数,如下所示:

typedef unsigned long long int MyInt;

MyInt scrape(size_t n, unsigned char * data)
{
    MyInt result = 0;
    size_t b;

    for (b = 0; b < n / 8; ++b)
    {
       result <<= 8;
       result += data[b];
    }

    const size_t remaining_bits = n % 8;
    result <<= remaining_bits;
    result += (data[b] >> (8 - remaining_bits));

    return result;
 }

我假设 CHAR_BITS == 8,如果你愿意,可以随意概括代码。此外,数组的大小乘以 8 必须至少为 n

If you have an array like this,

unsigned char data[2000];

then you can just scrape off the first n bits into an integer like so:

typedef unsigned long long int MyInt;

MyInt scrape(size_t n, unsigned char * data)
{
    MyInt result = 0;
    size_t b;

    for (b = 0; b < n / 8; ++b)
    {
       result <<= 8;
       result += data[b];
    }

    const size_t remaining_bits = n % 8;
    result <<= remaining_bits;
    result += (data[b] >> (8 - remaining_bits));

    return result;
 }

I'm assuming that CHAR_BITS == 8, feel free to generalize the code if you like. Also the size of the array times 8 must be at least n.

感性不性感 2024-11-24 08:29:21
string input;

using (System.Security.Cryptography.MD5 md5 = System.Security.Cryptography.MD5.Create())
{
    byte[] inputBytes = System.Text.Encoding.ASCII.GetBytes(input);
    byte[] hashBytes = md5.ComputeHash(inputBytes).TakeLast(7).ToArray();
    var hashStr = BitConverter.ToString(hashBytes).Replace("-", "");
    var res = long.Parse(hashStr, System.Globalization.NumberStyles.HexNumber);
    return res;
}
string input;

using (System.Security.Cryptography.MD5 md5 = System.Security.Cryptography.MD5.Create())
{
    byte[] inputBytes = System.Text.Encoding.ASCII.GetBytes(input);
    byte[] hashBytes = md5.ComputeHash(inputBytes).TakeLast(7).ToArray();
    var hashStr = BitConverter.ToString(hashBytes).Replace("-", "");
    var res = long.Parse(hashStr, System.Globalization.NumberStyles.HexNumber);
    return res;
}
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文