逆向工程 String.GetHashCode

发布于 2024-12-19 00:43:46 字数 931 浏览 2 评论 0原文

String.GetHashCode 的行为取决于程序架构。因此,它将在 x86 上返回一个值,在 x64 上返回一个值。我有一个必须在 x86 上运行的测试应用程序,并且它必须预测必须在 x64 上运行的应用程序的哈希代码输出。

下面是来自 mscorwks 的 String.GetHashCode 实现的反汇编。

public override unsafe int GetHashCode()
{
      fixed (char* text1 = ((char*) this))
      {
            char* chPtr1 = text1;
            int num1 = 0x15051505;
            int num2 = num1;
            int* numPtr1 = (int*) chPtr1;
            for (int num3 = this.Length; num3 > 0; num3 -= 4)
            {
                  num1 = (((num1 << 5) + num1) + (num1 >≫ 0x1b)) ^ numPtr1[0];
                  if (num3 <= 2)
                  {
                        break;
                  }
                  num2 = (((num2 << 5) + num2) + (num2 >> 0x1b)) ^ numPtr1[1];
                  numPtr1 += 2;
            }
            return (num1 + (num2 * 0x5d588b65));
      }
}

有人可以将此功能移植到安全的实现吗?

String.GetHashCode's behavior is depend on the program architecture. So it will return one value in x86 and one value on x64. I have a test application which must run in x86 and it must predict the hash code output from an application which must run on x64.

Below is the disassembly of the String.GetHashCode implementation from mscorwks.

public override unsafe int GetHashCode()
{
      fixed (char* text1 = ((char*) this))
      {
            char* chPtr1 = text1;
            int num1 = 0x15051505;
            int num2 = num1;
            int* numPtr1 = (int*) chPtr1;
            for (int num3 = this.Length; num3 > 0; num3 -= 4)
            {
                  num1 = (((num1 << 5) + num1) + (num1 >≫ 0x1b)) ^ numPtr1[0];
                  if (num3 <= 2)
                  {
                        break;
                  }
                  num2 = (((num2 << 5) + num2) + (num2 >> 0x1b)) ^ numPtr1[1];
                  numPtr1 += 2;
            }
            return (num1 + (num2 * 0x5d588b65));
      }
}

Can anybody port this function to a safe implementation??

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

野稚 2024-12-26 00:43:46

哈希码不能跨平台重复,甚至不能在同一系统上多次运行同一程序。 你走错路了。如果你不改变方向,你的道路将是艰难的,有一天它可能会以泪水结束。

你想要解决的真正问题是什么?是否可以编写自己的哈希函数(作为扩展方法或作为包装类的 GetHashCode 实现)并使用该函数?

Hash codes are not intended to be repeatable across platforms, or even multiple runs of the same program on the same system. You are going the wrong way. If you don't change course, your path will be difficult and one day it may end in tears.

What is the real problem you want to solve? Would it be possible to write your own hash function, either as an extension method or as the GetHashCode implementation of a wrapper class and use that one instead?

许你一世情深 2024-12-26 00:43:46

首先,乔恩是对的;这是愚蠢的差事。我们用来“吃自己的狗粮”的框架的内部调试版本每天都会精确地更改哈希算法,以防止人们构建依赖于不可靠实现细节的系统(甚至是测试系统)记录为随时可能更改。

我的建议是退后一步,问问自己为什么要尝试做如此危险的事情,而不是对被记录为不适合模拟的系统进行模拟。真的有要求吗?

其次,StackOverflow 是一个技术问答网站,而不是一个“免费为我做我的工作”的网站。如果您一心想做这种危险的事情,并且您需要有人可以将不安全代码重写为等效的安全代码,那么我建议您雇用可以为您做这件事的人

First off, Jon is correct; this is a fool's errand. The internal debug builds of the framework that we use to "eat our own dogfood" change the hash algorithm every day precisely to prevent people from building systems -- even test systems -- that rely on unreliable implementation details that are documented as subject to change at any time.

Rather than enshrining an emulation of a system that is documented as being not suitable for emulation, my recommendation would be to take a step back and ask yourself why you're trying to do something this dangerous. Is it really a requirement?

Second, StackOverflow is a technical question and answer site, not a "do my job for me for free" site. If you are hell bent on doing this dangerous thing and you need someone who can rewrite unsafe code into equivalent safe code then I recommend that you hire someone who can do that for you.

雨后彩虹 2024-12-26 00:43:46

虽然此处给出的所有警告都是有效的,但它们并没有回答问题。不幸的是,我遇到过这样的情况:GetHashCode() 已在生产中用于持久值,我别无选择,只能使用默认的 .NET 2.0 32 位 x86(小端)算法重新实现。我重新编码,没有不安全,如下所示,这似乎有效。希望这对某人有帮助。

// The GetStringHashCode() extension method is equivalent to the Microsoft .NET Framework 2.0
// String.GetHashCode() method executed on 32 bit systems.
public static int GetStringHashCode(this string value)
{
    int hash1 = (5381 << 16) + 5381;
    int hash2 = hash1;

    int len = value.Length;
    int intval;
    int c0, c1;
    int i = 0;
    while (len > 0)
    {
        c0 = (int)value[i];
        c1 = len > 1 ? (int)value[i + 1] : 0;
        intval = c0 | (c1 << 16);
        hash1 = ((hash1 << 5) + hash1 + (hash1 >> 27)) ^ intval;
        if (len <= 2)
        {
            break;
        }
        i += 2;
        c0 = (int)value[i];
        c1 = len > 3 ? (int)value[i + 1] : 0;
        intval = c0 | (c1 << 16);
        hash2 = ((hash2 << 5) + hash2 + (hash2 >> 27)) ^ intval;
        len -= 4;
        i += 2;
    }

    return hash1 + (hash2 * 1566083941);
}

While all of the warnings given here are valid, they don't answer the question. I had a situation in which GetHashCode() was unfortunately already being used for a persisted value in production, and I had no choice but to re-implement using the default .NET 2.0 32-bit x86 (little-endian) algorithm. I re-coded without unsafe as shown below, and this appears to be working. Hope this helps someone.

// The GetStringHashCode() extension method is equivalent to the Microsoft .NET Framework 2.0
// String.GetHashCode() method executed on 32 bit systems.
public static int GetStringHashCode(this string value)
{
    int hash1 = (5381 << 16) + 5381;
    int hash2 = hash1;

    int len = value.Length;
    int intval;
    int c0, c1;
    int i = 0;
    while (len > 0)
    {
        c0 = (int)value[i];
        c1 = len > 1 ? (int)value[i + 1] : 0;
        intval = c0 | (c1 << 16);
        hash1 = ((hash1 << 5) + hash1 + (hash1 >> 27)) ^ intval;
        if (len <= 2)
        {
            break;
        }
        i += 2;
        c0 = (int)value[i];
        c1 = len > 3 ? (int)value[i + 1] : 0;
        intval = c0 | (c1 << 16);
        hash2 = ((hash2 << 5) + hash2 + (hash2 >> 27)) ^ intval;
        len -= 4;
        i += 2;
    }

    return hash1 + (hash2 * 1566083941);
}
心清如水 2024-12-26 00:43:46

以下内容准确再现了默认String哈希代码< /a> 在.NET 4.7(可能更早)上。这是由以下给出的哈希码:

  • String 实例的默认值:"abc".GetHashCode()
  • StringComparer.Ordinal.GetHashCode("abc")
  • 采用 StringComparison.Ordinal 枚举的各种 String 方法。
  • System.Globalization.CompareInfo.GetStringComparer(CompareOptions.Ordinal)

在具有完整 JIT 优化的 release 版本上进行测试,这些版本的性能略优于内置 .NET 代码,并且还具有经过严格的单元测试,以确保与 .NET 行为完全相同。请注意,x86x64 有不同的版本。您的计划通常应包括两者;相应代码清单下方是一个调用工具,它在运行时选择适当的版本。

x86   -   (.NET 在 32 位模式下运行)

static unsafe int GetHashCode_x86_NET(int* p, int c)
{
    int h1, h2 = h1 = 0x15051505;

    while (c > 2)
    {
        h1 = ((h1 << 5) + h1 + (h1 >> 27)) ^ *p++;
        h2 = ((h2 << 5) + h2 + (h2 >> 27)) ^ *p++;
        c -= 4;
    }

    if (c > 0)
        h1 = ((h1 << 5) + h1 + (h1 >> 27)) ^ *p++;

    return h1 + (h2 * 0x5d588b65);
}

x64   -   (.NET 在 64 位模式下运行)

static unsafe int GetHashCode_x64_NET(Char* p)
{
    int h1, h2 = h1 = 5381;

    while (*p != 0)
    {
        h1 = ((h1 << 5) + h1) ^ *p++;

        if (*p == 0)
            break;

        h2 = ((h2 << 5) + h2) ^ *p++;
    }
    return h1 + (h2 * 0x5d588b65);
}

为任一平台 (x86/x64) 调用线束/扩展方法:

readonly static int _hash_sz = IntPtr.Size == 4 ? 0x2d2816fe : 0x162a16fe;

public static unsafe int GetStringHashCode(this String s)
{
    /// Note: x64 string hash ignores remainder after embedded '\0'char (unlike x86)
    if (s.Length == 0 || (IntPtr.Size == 8 && s[0] == '\0'))
        return _hash_sz;

    fixed (char* p = s)
        return IntPtr.Size == 4 ?
            GetHashCode_x86_NET((int*)p, s.Length) :
            GetHashCode_x64_NET(p);
}

The following exactly reproduces the default String hash codes on .NET 4.7 (and probably earlier). This is the hash code given by:

  • Default on a String instance: "abc".GetHashCode()
  • StringComparer.Ordinal.GetHashCode("abc")
  • Various String methods that take StringComparison.Ordinal enumeration.
  • System.Globalization.CompareInfo.GetStringComparer(CompareOptions.Ordinal)

Testing on release builds with full JIT optimization, these versions modestly outperform the built-in .NET code, and have also been heavily unit-tested for exact equivalence with .NET behavior. Notice there are separate versions for x86 versus x64. Your program should generally include both; below the respective code listings is a calling harness which selects the appropriate version at runtime.

x86   -   (.NET running in 32-bit mode)

static unsafe int GetHashCode_x86_NET(int* p, int c)
{
    int h1, h2 = h1 = 0x15051505;

    while (c > 2)
    {
        h1 = ((h1 << 5) + h1 + (h1 >> 27)) ^ *p++;
        h2 = ((h2 << 5) + h2 + (h2 >> 27)) ^ *p++;
        c -= 4;
    }

    if (c > 0)
        h1 = ((h1 << 5) + h1 + (h1 >> 27)) ^ *p++;

    return h1 + (h2 * 0x5d588b65);
}

x64   -   (.NET running in 64-bit mode)

static unsafe int GetHashCode_x64_NET(Char* p)
{
    int h1, h2 = h1 = 5381;

    while (*p != 0)
    {
        h1 = ((h1 << 5) + h1) ^ *p++;

        if (*p == 0)
            break;

        h2 = ((h2 << 5) + h2) ^ *p++;
    }
    return h1 + (h2 * 0x5d588b65);
}

Calling harness / extension method for either platform (x86/x64):

readonly static int _hash_sz = IntPtr.Size == 4 ? 0x2d2816fe : 0x162a16fe;

public static unsafe int GetStringHashCode(this String s)
{
    /// Note: x64 string hash ignores remainder after embedded '\0'char (unlike x86)
    if (s.Length == 0 || (IntPtr.Size == 8 && s[0] == '\0'))
        return _hash_sz;

    fixed (char* p = s)
        return IntPtr.Size == 4 ?
            GetHashCode_x86_NET((int*)p, s.Length) :
            GetHashCode_x64_NET(p);
}
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文