Mathematica 使用的默认哈希码是什么?
在线文档说
Hash[expr]
gives an integer hash code for the expression expr.
Hash[expr,"type"]
gives an integer hash code of the specified type for expr.
它还给出了“可能的哈希代码类型”:
- “Adler32”Adler 32位循环冗余校验
- “CRC32”32位循环冗余校验
- “MD2”128位MD2代码
- “MD5” " 128 位 MD5 代码
- "SHA" 160 位 SHA-1 代码
- "SHA256" 256 位 SHA 代码
- "SHA384" 384 位 SHA 代码
- "SHA512" 512 位 SHA 代码
然而,这些都不对应于返回的默认值哈希[expr]
。
所以我的问题是:
- 默认的
Hash
使用什么方法? - 是否有其他内置的哈希码?
The online documentation says
Hash[expr]
gives an integer hash code for the expression expr.
Hash[expr,"type"]
gives an integer hash code of the specified type for expr.
It also gives "possible hash code types":
- "Adler32" Adler 32-bit cyclic redundancy check
- "CRC32" 32-bit cyclic redundancy check
- "MD2" 128-bit MD2 code
- "MD5" 128-bit MD5 code
- "SHA" 160-bit SHA-1 code
- "SHA256" 256-bit SHA code
- "SHA384" 384-bit SHA code
- "SHA512" 512-bit SHA code
Yet none of these correspond to the default returned by Hash[expr]
.
So my questions are:
- What method does the default
Hash
use? - Are there any other hash codes built in?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
默认哈希算法或多或少是应用于底层表达式表示的基本 32 位哈希函数,但确切的代码是 Mathematica 内核的专有组件。它会在 Mathematica 版本之间发生(并且已经)发生变化,并且缺乏许多理想的加密属性,因此我个人建议您对安全性很重要的任何重要应用程序使用 MD5 或 SHA 变体之一。内置哈希旨在用于典型的数据结构用途(例如,在哈希表中)。
您从文档中列出的命名哈希算法是当前唯一可用的算法。您是否正在特别寻找另一款?
The default hash algorithm is, more-or-less, a basic 32-bit hash function applied to the underlying expression representation, but the exact code is a proprietary component of the Mathematica kernel. It's subject to (and has) change between Mathematica versions, and lacks a number of desirable cryptographic properties, so I personally recommend you use MD5 or one of the SHA variants for any serious application where security matters. The built-in hash is intended for typical data structure use (e.g. in a hash table).
The named hash algorithms you list from the documentation are the only ones currently available. Are you looking for a different one in particular?
我一直在 32 位和 64 位 Windows 版本的 Mathematica 10.4 上进行一些逆向工程,这就是我发现的:
32 BIT
它使用 Fowler–Noll–Vo 哈希函数(FNV-1,之前有乘法),16777619 为FNV prime 和 84696351 作为抵消基础。此函数应用于表达式数据地址的 Murmur3-32 哈希值(MMA 使用指针以保留每个数据的一个实例)。地址最终被解析为值 - 对于简单的机器整数,该值是立即的,对于其他机器整数来说有点棘手。 Murmur3-32 实现函数实际上包含一个附加参数(默认为 4,特殊情况下的行为与维基百科相同),用于选择从输入中的表达式结构中选择多少位。由于普通表达式在内部表示为指针数组,因此可以通过向表达式的基指针重复添加 4(字节 = 32 位)来获取第一个、第二个等。因此,将 8 传递给函数将给出第二个指针,12 将给出第三个指针,依此类推。由于内部结构(大整数、机器整数、机器实数、大实数等)具有不同的成员变量(例如,机器整数只有一个指向 int 的指针,一个复数 2 个指向数字的指针等。),对于每个表达式结构有一个“包装器”,将其内部成员组合在一个 32 位哈希中(基本上是 FNV-1 轮)。最简单的哈希表达式是整数。
murmur3_32() 函数将 1131470165 作为种子,n=0 和维基百科中的其他参数。
所以我们有:
其中“^”表示异或。
我真的没有尝试过 - 指针是使用 WINAPI EncodePointer() 进行编码的,因此它们不能在运行时被利用。 (可能值得在带有
EncodePonter
修改版本的 Wine 下运行?)64 BIT
它使用 FNV-1 64 位哈希函数,以 0xAF63BD4C8601B7DF 作为偏移基础,以 0x100000001B3 为基础作为 FNV prime,以及 SIP64-24 哈希 (这里是参考代码),其中 0x0AE3F68FE7126BBF76F98EF7F39DE1521 的前 64 位为 k0,最后 64 位为 k1。该函数应用于表达式的基指针并在内部解析。与 32 位的 murmur3 一样,有一个附加参数(默认为 8)来选择从输入表达式结构中选择多少个指针。对于每种表达式类型,都有一个包装器,用于通过 FNV-1 64 位轮将结构成员压缩为单个散列。
对于机器整数,我们有:
再说一遍,我并没有真正尝试过。有人可以尝试吗?
不适合胆小的人
如果你看一下他们的注释在内部实现上,他们说“每个表达式都包含一种特殊形式的哈希码,用于模式匹配和评估”。
他们所指的哈希码是由这些函数生成的哈希码 - 在正常表达式包装函数中的某个时刻,有一个赋值将计算出的哈希值放入表达式结构本身中。
了解他们如何利用这些哈希值进行模式匹配肯定会很酷。因此,我尝试运行 bigInteger 包装器,看看会发生什么 - 这是最简单的复合表达式。
它开始检查返回 1 的内容 - 不知道是什么。
所以它执行
使用 hashMachineInteger() 就是我们之前所说的 - 包括值。
然后它从结构体 (
bignum_length
) 中读取 bigInt 的长度(以字节为单位)并运行请注意,如果
4 * bignum_length
大于 8,则调用murmur3_32()
(可能与机器整数的最大值有关$MaxMachineNumber
< code>2^32^32 并与 bigInt 应该是什么相反)。所以,最终的代码是
我对这个结构的特性做了一些假设。许多 XOR 的存在以及
16777619 + 67918732 = 84696351
的事实可能会让人认为某种循环结构被用来检查模式 - 即减去偏移量并除以素数,或者其他什么像那样。 Cassandra 软件使用 Murmur 哈希算法生成令牌 - 请参阅 这些图像表达了我对“循环结构”的理解。也许每个表达式使用了不同的素数 - 仍然必须检查。希望有帮助
I've been doing some reverse engeneering on 32 and 64 bit Windows version of Mathematica 10.4 and that's what I found:
32 BIT
It uses a Fowler–Noll–Vo hash function (FNV-1, with multiplication before) with 16777619 as FNV prime and 84696351 as offset basis. This function is applied on Murmur3-32 hashed value of the address of expression's data (MMA uses a pointer in order to keep one instance of each data). The address is eventually resolved to the value - for simple machine integers the value is immediate, for others is a bit trickier. The Murmur3-32 implementing function contains in fact an additional parameter (defaulted with 4, special case making behaving as in Wikipedia) which selects how many bits to choose from the expression struct in input. Since a normal expression is internally represented as an array of pointers, one can take the first, the second etc.. by repeatedly adding 4 (bytes = 32 bit) to the base pointer of the expression. So, passing 8 to the function will give the second pointer, 12 the third and so on. Since internal structs (big integers, machine integers, machine reals, big reals and so on) have different member variables (e.g. a machine integer has only a pointer to int, a complex 2 pointers to numbers etc..), for each expression struct there is a "wrapper" that combine its internal members in one single 32-bit hash (basically with FNV-1 rounds). The simplest expression to hash is an integer.
The
murmur3_32()
function has 1131470165 as seed, n=0 and other params as in Wikipedia.So we have:
with " ^ " meaning XOR.
I really didn't try it - pointers are encoded using
WINAPI EncodePointer()
, so they can't be exploited at runtime. (May be worth running in Linux under Wine with a modified version ofEncodePonter
?)64 BIT
It uses a FNV-1 64 bit hash function with 0xAF63BD4C8601B7DF as offset basis and 0x100000001B3 as FNV prime, along with a SIP64-24 hash (here's the reference code) with the first 64 bit of 0x0AE3F68FE7126BBF76F98EF7F39DE1521 as k0 and the last 64 bit as k1. The function is applied to the base pointer of the expression and resolved internally. As in 32-bit's murmur3, there is an additional parameter (defaulted to 8) to select how many pointers to choose from the input expression struct. For each expression type there is a wrapper to condensate struct members into a single hash by means of FNV-1 64 bit rounds.
For a machine integer, we have:
Again, I didn't really try it. Could anyone try?
Not for the faint-hearted
If you take a look at their notes on internal implementation, they say that "Each expression contains a special form of hash code that is used both in pattern matching and evaluation."
The hash code they're referring to is the one generated by these functions - at some point in the normal expression wrapper function there's an assignment that puts the computed hash inside the expression struct itself.
It would certainly be cool to understand HOW they can make use of these hashes for pattern matching purpose. So I had a try running through the bigInteger wrapper to see what happens - that's the simplest compound expression.
It begins checking something that returns 1 - dunno what.
So it executes
with hashMachineInteger() is what we said before - including values.
Then it reads the length in bytes of the bigInt from the struct (
bignum_length
) and runsNote that
murmur3_32()
is called if4 * bignum_length
is greater than 8 (may be related to the max value of machine integers$MaxMachineNumber
2^32^32
and by converse to what a bigInt is supposed to be).So, the final code is
I've made some hypoteses on the properties of this construction. The presence of many XORs and the fact that
16777619 + 67918732 = 84696351
may make one think that some sort of cyclic structure is exploited to check patterns - i.e. subtracting the offset and dividing by the prime, or something like that. The software Cassandra uses the Murmur hash algorithm for token generation - see these images for what I mean with "cyclic structure". Maybe various primes are used for each expression - must still check.Hope it helps
看来Hash调用了内部Data`HashCode函数,然后除以2,取N[..]的前20位,然后取IntegerPart,加一,即:
It seems that Hash calls the internal Data`HashCode function, then divides it by 2, takes the first 20 digits of N[..] and then the IntegerPart, plus one, that is: