经过几次乘法**有溢出**之后是否有可能得到一个数字的原始值?
摘要:假设我有一个unsigned int 数字。然后我将其相乘几次(并且存在溢出,这是预期的)。那么是否可以“恢复”原始值?
详细信息:
这都是关于 < strong>Rabin-Karp 滚动哈希。我需要做的是:我有一个长字符串的哈希值 - 例如:“abcd”。然后我得到了较短子字符串的哈希值 - 例如“cd”。如何使用两个给定的哈希值以 O(1) 计算“ab”哈希值?
我现在的算法是:
- 从“abcd”哈希中减去“cd”哈希(从多项式中删除最后一个元素)
- 将“abcd”哈希除以
p ^ len( "cd" )
,其中p
是基数(质数)。
所以这是:
a * p ^ 3 + b * p ^ 2 + c * p ^ 1 + d * p ^ 0
- abcd
c * p ^ 1 + d * p ^ 0
- cd
ab 得到:
( ( a * p ^ 3 + b * p ^ 2 + c * p ^ 1 + d * p ^ 0 ) - ( c * p ^ 1 + d * p ^ 0 ) ) / ( p ^ 2 ) = a * p ^ 1 + b * p ^ 0
如果我没有溢出(如果 p是小数字)。但如果不是——它就不起作用。
有什么技巧或者什么吗?
PS c++
标签是因为数字溢出,因为它是特定的(并且与 python、scheme 或 sth 不同)
Summary: Suppose I have an unsigned int number. Then I multiply it several times(and there's overflow, which is expected). Then is it possible to "revert" the original value back?
In details:
It's all about Rabin-Karp rolling hash. What I need to do is: I have the hash of a long string - for example: "abcd". Then I have the hash for a shorter substring - for example "cd". How to calculate the "ab" hash with O(1), using the two given hashes?
What I have now as an algorithm:
- substract the "cd" hash from "abcd" hash (remove the last elements from the polynomial)
- devide the "abcd" hash by
p ^ len( "cd" )
, wherep
is the base (prime number).
So this is:
a * p ^ 3 + b * p ^ 2 + c * p ^ 1 + d * p ^ 0
- abcd
c * p ^ 1 + d * p ^ 0
- cd
ab gets:
( ( a * p ^ 3 + b * p ^ 2 + c * p ^ 1 + d * p ^ 0 ) - ( c * p ^ 1 + d * p ^ 0 ) ) / ( p ^ 2 ) = a * p ^ 1 + b * p ^ 0
And this works, if I don't have an overflow (if p
is small number). But if it's not - it's not working.
Is there any trick or something?
P.S. The c++
tag is because of the number's overflow, as it is specific (and different from python, scheme or sth)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
不知道溢出部分,但有一种方法可以恢复原始值。
中国剩余定理有很大帮助。让我们调用
h = abcd - cd
。 G 是值h
,没有溢出,G = h + k*2^32
,假设溢出只是%2^32
。因此ab = G / p^2
。如果 p^2 和 2^32 互质。 中国余数定理上的这个页面,给了我们
其中
b
是模块化的p^2 模 2^32 的乘法逆元,b * p^2 = 1 (mod 2^32)
。计算出G
后,只需除以p^2
即可找到ab
。Don't know about the overflow part, but there is a way of getting back the original value.
The Chinese Remainder Theorem help a great deal. Let's call
h = abcd - cd
. G is the value,h
, without overflows,G = h + k*2^32
, assuming the overflow simply does%2^32
. And thusab = G / p^2
.If p^2 and 2^32 are coprime. This page on Chinese Remainder Theorem, gives us
Where
b
is modular multiplicative inverse of p^2 modulo 2^32,b * p^2 = 1 (mod 2^32)
. After you calculateG
, simply divide byp^2
to findab
.扩展欧几里得算法是一个很好的解决方案,但它过于复杂且难以实现。还有一个更好的。
还有另一种方法可以做到这一点(感谢我的朋友(:)
wikipedia<中有一篇很好的文章/a> - 当
m
和a
互质时,使用欧拉定理的模乘法逆元:其中
φ(m)
是 Euler 的 totient 函数。在我的例子中,
m
(模)是哈希类型的大小 -2 ^32
、2^64
等(在我的例子中是 64 位)。嗯,这意味着,我们应该只找到
φ(m)
的值。但想想 -m == 2 ^ 64
所以,这给了我们保证m
将与所有奇数和 < em>不会与任何偶数互质。因此,我们需要做的是获取所有值的数量并将它们除以 2。此外,我们知道
m
将是无符号的,否则我们会遇到一些问题。这让我们有机会做到这一点:嗯,对于 64 位数字,
x
确实是一个很大的数字(19 位数字:9 223 372 036 854 775 807
),但是 < code>fast_pow 确实很快,我们可以缓存相反的数字,以防我们需要多个查询。fast_pow
是一种著名的算法:加法:例如:
效果完美且速度非常快。
Extended Euclidean algorithm is a good solution for this, but it's too complicated and hard to implement. There's a better one.
And there's another way to do this (thanks to e friend of mine (: )
There's a nice article in wikipedia - modular multiplicative inverse using Euler's theorem in the case, when
m
anda
are coprime:where
φ(m)
is Euler's totient function.In my case, the
m
(modulo) is the size of the hash type -2^32
,2^64
, etc. (64bit in my case).Well, this means, that we should only find the value of
φ(m)
. But think about that -m == 2 ^ 64
so, that gives us the guarantee thatm
will be coprime with all odd numbers and will NOT be coprime any even number. So, what we need to do is to get the number of all values and divide them by 2.Also, we know that
m
will be unsigned, as otherwise we will have some issues. Than this gives us the chance to do this:Well, about 64bit numbers,
x
is really big number ( 19 digits:9 223 372 036 854 775 807
), butfast_pow
is really fast and we could cache the reverse number, in case that we need for more than one query.fast_pow
is a well-known algorithm:Addition: for example:
works perfect and very fast.
你有一个 * b = c mod 2^32 (或 mod 其他东西,具体取决于你如何进行哈希)。如果你能找到 d 使得 b * d = 1 mod 2^32 (或 mod 其他),那么你可以计算 a * b * d = a
并检索 a.如果 gcd(b, mod 2^32) = 1 那么您可以使用 http://en.wikipedia。 org/wiki/Extended_Euclidean_algorithm 找到 x 和 y,使得 b * x + 2^32 * y = 1,或者
b * x = 1 - y * 2^32,或
b * x = 1 mod 2^32,因此 x 是您要乘以的数字。
You have a * b = c mod 2^32 (or mod something else depending on how you are doing your hash). If you could find d such that b * d = 1 mod 2^32 (or mod whatever) then you could compute a * b * d = a
and retrieve a. If gcd(b, mod 2^32) = 1 then you can use the http://en.wikipedia.org/wiki/Extended_Euclidean_algorithm to find x and y such that b * x + 2^32 * y = 1, or
b * x = 1 - y * 2^32, or
b * x = 1 mod 2^32, so x is the number you want to multiply by.
您应该使用无符号整数来获得定义的溢出行为(模 2^N)。有符号整数溢出未定义。
另外,您应该乘以 p 的乘法逆元对适当的值取模,而不是除法。例如,如果 p=3 并且您的哈希值是 8 位,则乘以 171,因为 171*3=513=2*256+1。如果 p 和模值互质,则存在乘法逆元。
You should use unsigned integers to get defined overflow behaviour (modulo 2^N). Signed integer overflow is undefined.
Also, instead of dividing you should multiply by the multiplicative inverse of p modulo the appropriate value. For example, if p=3 and your hash values are 8 bits, multiply by 171 because 171*3=513=2*256+1. The multiplicative inverse exists if p and the modulo value are relatively prime.
这里只是一个部分的侧面答案:我相信使用无符号整数并不是严格必要的。您可以使用补语。
但请注意,这将有 -0 和 +0 的单独表示,并且您可能必须在此过程中手动编写算术运算。
某些处理器指令与整数表示无关,但并非全部。
Just a partial side-answer here: i believe it is not strictly necessary to use unsigned integers. You can use one's complement.
But note, that this will have a separate representation for -0 and +0, and that you'll probably have to handcode arithmetic operations along the way.
Some of the processor instructions are agnostic of the integer representation but not all.
所以溢出其实只是你的编译器对你好而已; C/++ 标准实际上表明溢出是未定义的行为。因此,一旦溢出,您实际上无能为力,因为您的程序不再是确定性的。
您可能需要重新考虑算法,或者添加模运算/减法来修复您的算法。
So overflow is actually just your compiler being nice to you; the C/++ standard actually suggests that overflowing is undefined behaviour. So once you've overflown, there's actually nothing you can do because your program ceases to be deterministic.
You might need to rethink the algorithm, or tack on modulo operations / subtractions to fix your algorithm.