如何根据对象的内容生成唯一的哈希码？

发布于 2024-10-30 18:56:43 字数 561 浏览 0 评论 0原文

我需要根据对象的内容生成一个唯一的哈希代码，例如 DateTime(2011,06,04) 应等于 DateTime(2011,06,04)。

我无法使用 .GetHashCode() 因为它可能会为具有不同内容的对象生成相同的哈希代码。
我无法使用 ObjectIDGenerator 中的 .GetID，因为它为具有相同内容的对象生成不同的哈希代码。
如果该对象包含其他子对象，则需要递归检查这些子对象。
它需要在集合上工作。

我需要写这个的原因是什么？我正在使用 PostSharp 编写一个缓存层。

更新

我想我可能问了错误的问题。正如 Jon Skeet 指出的那样，为了安全起见，我在缓存键中需要与对象中潜在数据的组合一样多的唯一组合。因此，最好的解决方案可能是使用反射构建一个长字符串，对对象的公共属性进行编码。对象不是太大，所以这是非常快速和高效的：

构造缓存键是高效的（只需将对象的公共属性转换为大字符串）。
检查缓存命中（比较两个字符串）非常有效。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

自我难过 2024-11-06 18:56:43

来自评论：

我想要一些类似于基于对象内容的 GUID 的东西。我不介意每隔 10 万亿万亿年左右偶尔出现一次重复

这似乎是一个不寻常的要求，但既然这是您的要求，让我们来计算一下。

假设您每年制造 10 亿个独特的物体（每秒 30 个），持续 10 万亿万亿年。您正在创建 10⁴⁹ 个独特的对象。计算数学很容易； 当哈希的位大小小于 384 时，该时间内至少发生一次哈希冲突的概率高于十分之一¹⁸。

因此，您至少需要384 位哈希码以获得您所需的唯一性级别。这是一个方便的大小，为 12 个 int32。如果您打算每秒制作超过 30 个物体，或者希望概率小于十分之一¹⁸，那么就需要更多位。

为什么有这么严格的要求？

如果我有您提出的要求，我会这样做。第一个问题是将每个可能的数据转换为自描述的位序列。如果您已有序列化格式，请使用它。如果没有，请发明一种可以序列化您对散列感兴趣的所有可能对象的工具。

然后，为了对对象进行哈希处理，将其序列化为字节数组，然后通过 SHA-384 或 SHA-512 哈希算法运行该字节数组。这将产生专业加密级 384 或 512 位哈希值，即使面对试图强制碰撞的攻击者，该哈希值也被认为是唯一的。如此多的比特应该足以确保在十万亿万亿年的时间范围内发生低概率的碰撞。

回复收藏 0 原文

余生共白头 2024-11-06 18:56:43

如果您需要创建一个唯一哈希码，那么您基本上是在谈论一个可以代表您的类型可以拥有的尽可能多的状态的数字。我相信，对于 DateTime 来说，比意味着采用 Ticks 值和 DateTimeKind 。

您可能可以假设 Ticks 属性的前两位为零，并使用它们来存储类型。据我所知，这意味着直到 7307 年你都没有问题：

private static ulong Hash(DateTime when)
{
    ulong kind = (ulong) (int) when.Kind;
    return (kind << 62) | (ulong) when.Ticks;
}

If you need to create a unique hash code, then you're basically talking about a number which can represent as many states as your type can have. For DateTime than means taking the Ticks value and the DateTimeKind, I believe.

You may be able to get away with assuming that the top two bits of the Ticks property are going to be zero, and using those to store the kind. That means you're okay up until the year 7307 as far as I can tell:

private static ulong Hash(DateTime when)
{
    ulong kind = (ulong) (int) when.Kind;
    return (kind << 62) | (ulong) when.Ticks;
}

回复收藏 0 原文

一曲爱恨情仇 2024-11-06 18:56:43

您在这里谈论的不是哈希码，您需要一个状态的数字表示 - 为了使其唯一，它可能必须非常大，具体取决于您的对象结构。

我需要写这个的原因是什么？我是
使用编写缓存层
PostSharp。

为什么不使用常规的哈希码，并通过实际比较对象来处理冲突？这似乎是最合理的做法。

回复收藏 0 原文

束缚ｍ 2024-11-06 18:56:43

对 BrokenGlass 答案的补充，我已投票并认为该答案是正确的：

使用 GetHashCode/Equals 方法意味着如果两个对象散列到相同的值，则“将依靠其 Equals 实现来告诉您它们是否相等。

除非这些对象重写 Equals（这实际上意味着它们实现 IEquatable，其中 T 是它们的类型），的默认实现code>Equals 将进行参考比较。这反过来意味着您的缓存会错误地错过业务意义上“相等”但独立构造的对象。

仔细考虑缓存的使用模型，因为如果您最终将其用于不 IEquatable 的类，并且以您期望检查非引用的方式使用它，如果对象相等，那么缓存将变得完全无用。

回复收藏 0 原文

横笛休吹塞上声 2024-11-06 18:56:43

我无法使用 .GetHashCode()，因为它可能会为具有不同内容的对象生成相同的哈希代码。

哈希码发生冲突是很正常的。如果您的哈希码具有固定长度（在标准 .NET 哈希码的情况下为 32 位），那么您必然会与范围大于此值的任何值发生冲突（例如，long 为 64 位；n*64 n 个长整型数组的位等）。

事实上，对于任何具有有限长度 N 的哈希码，超过 N 个元素的集合总是会发生冲突。

你所要求的在一般情况下是不可行的。

回复收藏 0 原文

风吹短裙飘 2024-11-06 18:56:43

我们有完全相同的要求，这是我想出的功能。这对于我们需要缓存的对象类型非常有效

public static string CreateCacheKey(this object obj, string propName = null)
{
    var sb = new StringBuilder();
    if (obj.GetType().IsValueType || obj is string)
        sb.AppendFormat("{0}_{1}|", propName, obj);
    else
        foreach (var prop in obj.GetType().GetProperties())
        {
            if (typeof(IEnumerable<object>).IsAssignableFrom(prop.PropertyType))
            {
                var get = prop.GetGetMethod();
                if (!get.IsStatic && get.GetParameters().Length == 0)
                {
                    var collection = (IEnumerable<object>)get.Invoke(obj, null);
                    if (collection != null)
                        foreach (var o in collection)
                            sb.Append(o.CreateCacheKey(prop.Name));
                }
            }
            else
                sb.AppendFormat("{0}{1}_{2}|", propName, prop.Name, prop.GetValue(obj, null));

        }
    return sb.ToString();
}

，例如，如果我们有类似

var bar = new Bar()
{
    PropString = "test string",
    PropInt = 9,
    PropBool = true,
    PropListString = new List<string>() {"list string 1", "list string 2"},
    PropListFoo =
        new List<Foo>()
            {new Foo() {PropString = "foo 1 string"}, new Foo() {PropString = "foo 2 string"}},
    PropListTuple =
        new List<Tuple<string, int>>()
            {
                new Tuple<string, int>("tuple 1 string", 1), new Tuple<string, int>("tuple 2 string", 2)
            }
};

var cacheKey = bar.CreateCacheKey();

上面方法生成的缓存键，

PropString_test 字符串|PropInt_9|PropBool_True|PropListString_list 字符串 1|PropListString_list 字符串 2|PropListFooPropString_foo 1 字符串|PropListFooPropString_foo 2 字符串|PropListTupleItem1_tuple 1 字符串|PropListTupleItem2_1|PropListTupleItem1_tuple 2 字符串|PropListTupleItem2_2|

We had exactly the same requirement and here is the function I came up with. This is what works well for types of objects we need to cache

public static string CreateCacheKey(this object obj, string propName = null)
{
    var sb = new StringBuilder();
    if (obj.GetType().IsValueType || obj is string)
        sb.AppendFormat("{0}_{1}|", propName, obj);
    else
        foreach (var prop in obj.GetType().GetProperties())
        {
            if (typeof(IEnumerable<object>).IsAssignableFrom(prop.PropertyType))
            {
                var get = prop.GetGetMethod();
                if (!get.IsStatic && get.GetParameters().Length == 0)
                {
                    var collection = (IEnumerable<object>)get.Invoke(obj, null);
                    if (collection != null)
                        foreach (var o in collection)
                            sb.Append(o.CreateCacheKey(prop.Name));
                }
            }
            else
                sb.AppendFormat("{0}{1}_{2}|", propName, prop.Name, prop.GetValue(obj, null));

        }
    return sb.ToString();
}

So for example if we have something like this

var bar = new Bar()
{
    PropString = "test string",
    PropInt = 9,
    PropBool = true,
    PropListString = new List<string>() {"list string 1", "list string 2"},
    PropListFoo =
        new List<Foo>()
            {new Foo() {PropString = "foo 1 string"}, new Foo() {PropString = "foo 2 string"}},
    PropListTuple =
        new List<Tuple<string, int>>()
            {
                new Tuple<string, int>("tuple 1 string", 1), new Tuple<string, int>("tuple 2 string", 2)
            }
};

var cacheKey = bar.CreateCacheKey();

Cache key generated by method above will be

PropString_test string|PropInt_9|PropBool_True|PropListString_list string 1|PropListString_list string 2|PropListFooPropString_foo 1 string|PropListFooPropString_foo 2 string|PropListTupleItem1_tuple 1 string|PropListTupleItem2_1|PropListTupleItem1_tuple 2 string|PropListTupleItem2_2|

回复收藏 0 原文

阳光下慵懒的猫 2024-11-06 18:56:43

您可以从序列化为 json 的对象计算 ex md5 sum （或类似的东西）。
如果您只希望某些属性重要，您可以在途中创建匿名对象：

 public static string GetChecksum(this YourClass obj)
    {
        var copy = new
        {
           obj.Prop1,
           obj.Prop2
        };
        var json = JsonConvert.SerializeObject(ob);

        return json.CalculateMD5Hash();
    }

我用它来检查是否有人弄乱了我存储基于许可证的数据的数据库。您还可以在 json 变量中附加一些种子以使事情变得复杂

You can calculate ex md5 sum (or something like that) from object serialized to json.
If you want only some properties to matter, you can create anonymous object on the way:

 public static string GetChecksum(this YourClass obj)
    {
        var copy = new
        {
           obj.Prop1,
           obj.Prop2
        };
        var json = JsonConvert.SerializeObject(ob);

        return json.CalculateMD5Hash();
    }

I use that for checking if someone messed with my database storing license based data. You can also append json variable with some seed to complicate stuff

回复收藏 0 原文

你不是我要的菜∠ 2024-11-06 18:56:43

这种扩展方法适合您的目的吗？如果对象是值类型，则仅返回其哈希码。否则，它会递归地获取每个属性的值并将它们组合成一个散列。

using System.Reflection;

public static class HashCode
{
    public static ulong CreateHashCode(this object obj)
    {
        ulong hash = 0;
        Type objType = obj.GetType();

        if (objType.IsValueType || obj is string)
        {
            unchecked
            {
                hash = (uint)obj.GetHashCode() * 397;
            }

            return hash;
        }

        unchecked
        {
            foreach (PropertyInfo property in obj.GetType().GetProperties())
            {
                object value = property.GetValue(obj, null);
                hash ^= value.CreateHashCode();
            }
        }

        return hash;
    }
}

Would this extension method suit your purposes? If the object is a value type, it just returns its hash code. Otherwise, it recursively gets the value of each property and combines them into a single hash.

using System.Reflection;

public static class HashCode
{
    public static ulong CreateHashCode(this object obj)
    {
        ulong hash = 0;
        Type objType = obj.GetType();

        if (objType.IsValueType || obj is string)
        {
            unchecked
            {
                hash = (uint)obj.GetHashCode() * 397;
            }

            return hash;
        }

        unchecked
        {
            foreach (PropertyInfo property in obj.GetType().GetProperties())
            {
                object value = property.GetValue(obj, null);
                hash ^= value.CreateHashCode();
            }
        }

        return hash;
    }
}

回复收藏 0 原文

指尖上的星空 2024-11-06 18:56:43

这里的一些答案会序列化为 JSON 并从中生成 MD5 哈希值。这在大多数情况下都有效，除非您有集合并且项目顺序不同。由于集合顺序的不同，同一对象可能会生成不同的哈希值。

我想出的解决方案如下，我序列化为 JSON（使用 Newtonsoft Json.NET），并通过对每个项目进行散列并按该散列进行排序来对任何子集合进行排序。这给了我们一个确定性的序列化表示，我们可以在其上生成哈希。

可能有一些场景我没有完全考虑到，但这适用于大多数常见场景的复杂对象的嵌套集合。

static class ObjectHashGenerator
{
    private static readonly OrderedPropertiesContractResolver ContractResolver = new();
    private static readonly OrderedCollectionConverter Converter = new();
    private static readonly IList<JsonConverter> Converters = new List<JsonConverter>(new[] { Converter });
    private static readonly JsonSerializerSettings Settings = new()
    {
        ContractResolver = ContractResolver,
        Converters = Converters
    };
    
    public static string GenerateHash(this object item)
    {
        var serializedItem = JsonConvert.SerializeObject(item, Settings);
        var hash = GenerateMd5(serializedItem);
        return hash;
    }

    public static string GenerateMd5(string input)
    {
        using var md5 = MD5.Create();
        var inputBytes = Encoding.UTF8.GetBytes(input);
        var hashBytes = md5.ComputeHash(inputBytes);
        return Convert.ToHexString(hashBytes);
    }
}

sealed class OrderedPropertiesContractResolver : DefaultContractResolver
{
    protected override IList<JsonProperty> CreateProperties(Type type, MemberSerialization memberSerialization)
    {
        var props = base.CreateProperties(type, memberSerialization);
        return props.OrderBy(p => p.PropertyName).ToList();
    }
}

sealed class OrderedCollectionConverter : JsonConverter
{
    public override bool CanConvert(Type type)
    {
        if (type == typeof(string)) 
            return false;
        
        return typeof(IEnumerable).IsAssignableFrom(type);
    }

    public override void WriteJson(JsonWriter writer, object? value, JsonSerializer serializer)
    {
        if (value is not IEnumerable enumerable) 
            return;
        
        var itemsJson = new List<string>();
        
        foreach (var item in enumerable)
        {
            var stringBuilder = new StringBuilder();
            using var stringWriter = new StringWriter(stringBuilder);
            serializer.Serialize(stringWriter, item); 
            
            var result = stringBuilder.ToString();
            itemsJson.Add(result);
            stringBuilder.Clear();
        }
        
        // We order each collection by hash of the item so the serialized JSON is deterministically 
        // created so the hash can be the same for objects regardless of collection order on the original.
        writer.WriteStartArray();
        foreach (var item in itemsJson.OrderBy(ObjectHashGenerator.GenerateMd5))
            writer.WriteRawValue(item);
        writer.WriteEndArray();
    }

    public override object ReadJson(JsonReader reader, Type type, object? existingValue, JsonSerializer serializer)
    {
        // This converter is only used for serialization in order to generate a hash
        throw new NotImplementedException();
    }
}

Some of the answers here serialize to JSON and generate an MD5 hash from that. This works most the time except when you have collections and the item order is different. The same object could generate different hashes because of the collection order difference.

The solution I came up with is below where I serialize to JSON (using Newtonsoft Json.NET) and order any child collections by hashing each of the items and sorting by that hash. This gives us a deterministic serialized representation we can generate a hash on.

There might be some scenarios I'm not fully accounting for, but this works for the nested collections of complex objects for most common scenarios.

static class ObjectHashGenerator
{
    private static readonly OrderedPropertiesContractResolver ContractResolver = new();
    private static readonly OrderedCollectionConverter Converter = new();
    private static readonly IList<JsonConverter> Converters = new List<JsonConverter>(new[] { Converter });
    private static readonly JsonSerializerSettings Settings = new()
    {
        ContractResolver = ContractResolver,
        Converters = Converters
    };
    
    public static string GenerateHash(this object item)
    {
        var serializedItem = JsonConvert.SerializeObject(item, Settings);
        var hash = GenerateMd5(serializedItem);
        return hash;
    }

    public static string GenerateMd5(string input)
    {
        using var md5 = MD5.Create();
        var inputBytes = Encoding.UTF8.GetBytes(input);
        var hashBytes = md5.ComputeHash(inputBytes);
        return Convert.ToHexString(hashBytes);
    }
}

sealed class OrderedPropertiesContractResolver : DefaultContractResolver
{
    protected override IList<JsonProperty> CreateProperties(Type type, MemberSerialization memberSerialization)
    {
        var props = base.CreateProperties(type, memberSerialization);
        return props.OrderBy(p => p.PropertyName).ToList();
    }
}

sealed class OrderedCollectionConverter : JsonConverter
{
    public override bool CanConvert(Type type)
    {
        if (type == typeof(string)) 
            return false;
        
        return typeof(IEnumerable).IsAssignableFrom(type);
    }

    public override void WriteJson(JsonWriter writer, object? value, JsonSerializer serializer)
    {
        if (value is not IEnumerable enumerable) 
            return;
        
        var itemsJson = new List<string>();
        
        foreach (var item in enumerable)
        {
            var stringBuilder = new StringBuilder();
            using var stringWriter = new StringWriter(stringBuilder);
            serializer.Serialize(stringWriter, item); 
            
            var result = stringBuilder.ToString();
            itemsJson.Add(result);
            stringBuilder.Clear();
        }
        
        // We order each collection by hash of the item so the serialized JSON is deterministically 
        // created so the hash can be the same for objects regardless of collection order on the original.
        writer.WriteStartArray();
        foreach (var item in itemsJson.OrderBy(ObjectHashGenerator.GenerateMd5))
            writer.WriteRawValue(item);
        writer.WriteEndArray();
    }

    public override object ReadJson(JsonReader reader, Type type, object? existingValue, JsonSerializer serializer)
    {
        // This converter is only used for serialization in order to generate a hash
        throw new NotImplementedException();
    }
}

回复收藏 0 原文

完美的未来在梦里 2024-11-06 18:56:43

通用扩展方法

public static class GenericExtensions
{
    public static int GetDeepHashCode<T>(this T obj)
    {
        if (obj == null)
            return 0;

        if (typeof(T).IsValueType)
            return obj.GetHashCode();

        var result = 0;

        if (typeof(T) is IEnumerable)
        {
            var enumerable = obj as IEnumerable<T>;

            using (var enumerator = enumerable.GetEnumerator())
            {
                var i = 1;

                while (true)
                {
                    bool moveNextA = enumerator.MoveNext();

                    if (!moveNextA)
                        break;

                    var current = enumerator.Current;

                    result += current.GetDeepHashCode() * i;

                    i++;
                }

                return result;
            }
        }

        foreach (var property in obj.GetType().GetProperties())
        {
            var value = property.GetValue(obj);

            result += value.GetDeepHashCode();
        }

        return result;
    }
}

Generic Extension Method

public static class GenericExtensions
{
    public static int GetDeepHashCode<T>(this T obj)
    {
        if (obj == null)
            return 0;

        if (typeof(T).IsValueType)
            return obj.GetHashCode();

        var result = 0;

        if (typeof(T) is IEnumerable)
        {
            var enumerable = obj as IEnumerable<T>;

            using (var enumerator = enumerable.GetEnumerator())
            {
                var i = 1;

                while (true)
                {
                    bool moveNextA = enumerator.MoveNext();

                    if (!moveNextA)
                        break;

                    var current = enumerator.Current;

                    result += current.GetDeepHashCode() * i;

                    i++;
                }

                return result;
            }
        }

        foreach (var property in obj.GetType().GetProperties())
        {
            var value = property.GetValue(obj);

            result += value.GetDeepHashCode();
        }

        return result;
    }
}

回复收藏 0 原文

~没有更多了~