字符串 Rabin-Karp 基本数字符号
我正在 Cormen 等人的《算法简介》中阅读有关字符串算法的内容,
以下是有关一些基本数论符号的文本。
注意:在下面的文本中,将 == 称为模等价。
给定一个整数除以另一个整数时的余数的明确定义的概念,可以方便地提供特殊符号来指示余数相等。如果 (a mod n) = (b mod n),我们写 a == b (mod n) 并说 a 等于 b,模 n。换句话说,如果 a 和 b 除以 n 时余数相同,则 a == b (mod n)。等价地,a == b (mod n) 当且仅当 n | (b-a)。 例如,61 == 6(模 11)。另外,-13 == 22 == 2 == (mod 5)。
整数可以根据其模n的余数分为n个等价类。包含整数 a 的模 n 的等价类是
[a]n = {a + kn : k Z} 。
例如,[3]7 = {. 。 。 、-11、-4、3、10、17、。 。 .};该集合的其他表示是[-4]7 和[10]7。
写 a 属于 [b]n 与写 a == b (mod n) 相同。所有此类等价类的集合为
Zn = {[a]n : 0 <= a <= n - 1}。---------->等式1
我在上面的文本中的问题是在等式1中,提到“a”应该在0和n-1之间,但在示例中它给出为-4,而不是在0和6之间,为什么?
除了上面提到的,对于 Rabin-Karp 算法,我们使用两个数字对第三个数字取模的等价性?这意味着什么?
I am reading about String algorithms in Introduction to Algorithms by Cormen etc
Following is text about some elementary number theoretic notations.
Note: In below text refere == as modulo equivalence.
Given a well-defined notion of the remainder of one integer when divided by another, it is convenient to provide special notation to indicate equality of remainders. If (a mod n) = (b mod n), we write a == b (mod n) and say that a is equivalent to b, modulo n. In other words, a == b (mod n) if a and b have the same remainder when divided by n. Equivalently, a == b (mod n) if and only if n | (b - a).
For example, 61 == 6 (mod 11). Also, -13 == 22 == 2 == (mod 5).
The integers can be divided into n equivalence classes according to their remainders modulo n. The equivalence class modulo n containing an integer a is
[a]n = {a + kn : k Z} .
For example, [3]7 = {. . . , -11, -4, 3, 10, 17, . . .}; other denotations for this set are [-4]7 and [10]7.
Writing a belongs to [b]n is the same as writing a == b (mod n). The set of all such equivalence classes is
Zn = {[a]n : 0 <= a <= n - 1}.----------> Eq 1
My question in above text is in equation 1 it is mentioned that "a" should be between 0 and n-1, but in example it is given as -4 which is not between 0 and 6, why?
In addition to above it is mentioned that for Rabin-Karp algorithm we use equivalence of two numbers modulo a third number? What does this mean?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
我会尽力将您推向正确的方向,即使这与编程无关。
其中 -4 的示例是等价类的示例,它是与给定数字等价的所有数字的集合。因此,在 [3]7 中,所有数字都等于(模 7)3,其中包括 -4 以及 17 和 710 以及无穷大的其他数字。
您还可以将同一个类命名为 [10]7,因为每个等于(模 7)到 3 的数字同时等于(模 7)到 10。
最后一个定义给出了一组所有不同< /em> 等价类,并指出对于模 7,正好有 7 个,并且可以由 0 到 6 的数字生成。您也可以说
和 含义保持不变,因为 [0]7与[7]7 相同,[6]7 与[13]7 相同。
I'll try to nudge you in the right direction, even though it's not about programming.
The example with -4 in it is an example of an equivalence class, which is a set of all numbers equivalent to a given number. Thus, in [3]7, all numbers are equivalent (modulo 7) to 3, and that includes -4 as well as 17 and 710 and an infinity of others.
You could also name the same class [10]7, because every number that is equivalent (modulo 7) to 3 is at the same time equivalent (modulo 7) to 10.
The last definition gives a set of all distinct equivalence classes, and states that for modulo 7, there is exactly 7 of them, and can be produced by numbers from 0 to 6. You could also say
and the meaning will remain the same, since [0]7 is the same thing as [7]7, and [6]7 is the same thing as [13]7.
这不是一个编程问题,但没关系......
因为对于某些 x,[-4]n 与 [x]n 是相同的等价类,使得 0 <= x <名词因此,方程 1 利用这一事实来“整理”定义并使所有可能性变得清晰。
Rabin-Karp 算法要求您计算要搜索的子字符串的哈希值。进行散列时,即使对于非常小的字符串,使用使用整个可用域的散列函数也很重要。如果您的哈希值是 32 位整数,并且您只是将连续的 unicode 值相加,那么您的哈希值通常会非常小,从而导致大量冲突。
所以你需要一个可以给你大量答案的函数。不幸的是,这也使您面临整数溢出的可能性。因此,您可以使用模运算来防止比较因溢出而混乱。
This is not a programming question, but never mind...
Because [-4]n is the same equivalence class as [x]n for some x such that 0 <= x < n. So equation 1 takes advantage of the fact to "neaten up" the definition and make all the possibilities distinct.
The Rabin-Karp algorithm requires you to calculate a hash value for the substring you are searching for. When hashing, it is important to use a hash function that uses the whole of the available domain even for quite small strings. If your hash is a 32 bit integer and you just add the successive unicode values together, your hash will usually be quite small resulting in lots of collisions.
So you need a function that can give you large answers. Unfortunately, this also exposes you to the possibility of integer overflow. Hence you use modulo arithmetic to keep the comparisons from being messed up by overflow.