如何在连接2个字符串后快速生成新的字符串哈希

发布于 2024-08-27 19:26:50 字数 1064 浏览 7 评论 0原文

如果我的数学正确,如果我已经有了每个字符串的单独哈希值,我可以快速为两个字符串的串联生成一个新的哈希值。但前提是散列函数的形式为:

hash(n) = k * hash(n-1) + c(n), and h(0) = 0.

在本例中,

hash( concat(s1,s2) ) = k**length(s2) * hash(s1) + hash(s2)

例如。

h1  = makeHash32_SDBM( "abcdef",          6 );
h2  = makeHash32_SDBM( "ghijklmn",        8 );
h12 = makeHash32_SDBM( "abcdefghijklmn", 14 );
hx  = mod32_powI( 65599, 8 ) * h1 + h2;

h1  = 2534611139
h2  = 2107082500
h12 = 1695963591
hx  = 1695963591

Note that h12 = hx so this demonstrates the idea.

现在,对于 SDBM 哈希 k=65599。而 DJB 哈希 具有 k=33(或者可能 31?)和 h(0) = 5381 所以要使其正常工作,您可以设置 h(0) = 0

但对 DJB 哈希 的修改使用 xor 而不是 + 来添加每个字符。

http://www.cse.yorku.ca/~oz/hash.html< /a>

如果哈希函数使用xor而不是+,是否有另一种技术可以快速计算连接字符串的哈希值?

If my math is right, I can quickly generate a new hash value for the concatenation of two strings if I already have the individual hash values for each string. But only if the hash function is of the form:

hash(n) = k * hash(n-1) + c(n), and h(0) = 0.

In this case,

hash( concat(s1,s2) ) = k**length(s2) * hash(s1) + hash(s2)

eg.

h1  = makeHash32_SDBM( "abcdef",          6 );
h2  = makeHash32_SDBM( "ghijklmn",        8 );
h12 = makeHash32_SDBM( "abcdefghijklmn", 14 );
hx  = mod32_powI( 65599, 8 ) * h1 + h2;

h1  = 2534611139
h2  = 2107082500
h12 = 1695963591
hx  = 1695963591

Note that h12 = hx so this demonstrates the idea.

Now, for the SDBM hash k=65599. Whereas the DJB hash has k=33 (or perhaps 31?) and h(0) = 5381 so to make it work you can set h(0) = 0 instead.

But a modification on the DJB hash uses xor instead of + to add each character.

http://www.cse.yorku.ca/~oz/hash.html

Is there another technique to quickly calculate the hash value of concatenated strings if the hash function uses xor instead of +?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

烟沫凡尘 2024-09-03 19:26:50

如果您的第二个哈希值是哈希值初始状态的函数,那么情况就是如此。对于某些类型的哈希函数,很容易根据新的初始状态(例如异或词或其总和等)来移动它们。但在一般情况下,这几乎是不可能的(在其他情况下,在密码匹配中使用哈希+“盐”会更容易被破解)。

但通常您可以使用第一个字符串的哈希结果,然后继续从第二个字符串提供数据。

更新:
我想没有办法找到 f(x,y)

h_abc = hashOf(h0, "abc")  
h_def = hashOf(h0, "def")  
(h_abcdef = f(h_abc, h_def)) == hashOf(h0, "abcdef")  

但你可以:

h_abc = hashOf(h1, "abc")  
(h_abcdef = hashOf(h_abc, "def")) == hashOf(h0, "abcdef")  

你不能这样做的原因之一是“33”不是 的幂“2”。如果它将使用“32”(2**5),则:

h_abcdef == (h_abc << (5*len(abc))) xor h_def

That would be true if your second hash will be function of initial state of hash. For some kinds of hash-function it's easy to shift them according to new initial state (like xor'e words, or their sum etc). But in general case that's almost impossible (in other case use of hash+"salt" in password matching will be easier to break).

But usually you can use result of hashing of first string and than continue feeding data from second string.

Update:
I guess there is no way to find f(x,y) for:

h_abc = hashOf(h0, "abc")  
h_def = hashOf(h0, "def")  
(h_abcdef = f(h_abc, h_def)) == hashOf(h0, "abcdef")  

But you able to:

h_abc = hashOf(h1, "abc")  
(h_abcdef = hashOf(h_abc, "def")) == hashOf(h0, "abcdef")  

one of the reason for why you can't do that is that "33" isn't power of "2". If it will use "32" (2**5), then:

h_abcdef == (h_abc << (5*len(abc))) xor h_def
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文