Rabin-Karp 字符串匹配不匹配

发布于 2024-10-06 14:31:09 字数 1667 浏览 11 评论 0原文

我一直在 C++ 中研究 Rabin-Karp 字符串匹配函数，但没有得到任何结果。我有一种感觉，我没有正确计算某些值，但我不知道是哪些值。

原型

void rabinKarp(string sequence, string pattern, int d, int q);

函数实现

void rabinKarp(string sequence, string pattern, int d, int q)
{
    //d is the |∑|
    //q is the prime number to use to lessen spurious hits
    int n = sequence.length(); //Length of the sequence
    int m = pattern.length(); //Length of the pattern
    double temp = static_cast<double> (m - 1.0);
    double temp2 = pow(static_cast<double> (d), temp); //Exponentiate d
    int h = (static_cast<int>(temp2)) % q; //High Order Position of an m-digit window
    int p = 0; //Pattern decimal value
    int t = 0; //Substring decimal value
    for (int i = 1; i < m; i++) { //Preprocessing
        p = (d*p + (static_cast<int>(pattern[i]) - 48)) % q;
        t = (d*t + (static_cast<int>(sequence[i])-48)) % q;
    }
    for (int s = 0; s < (n-m); s++) { //Matching(Iterate through all possible shifts)
        if (p == t) {
            for (int j = 0; j < m; j++) {
                if (pattern[j] == sequence[s+j]) {
                    cout << "Pattern occurs with shift: " << s << endl;
                }
            }
        }
        if (s < (n-m)) {
            t = (d*(t - ((static_cast<int>(sequence[s+1]) - 48)*h)) + (static_cast<int>(sequence[s + m + 1]) - 48)) % q;
        }
    }
    return;
}

在我的函数调用中，我传递 2359023141526739921 作为序列，31415 作为模式，10 作为基数，13 作为素数。我预计会有一个实际匹配和一个虚假命中，但我从未从函数的匹配部分获得输出语句。我做错了什么？

提前致谢，麦迪逊

原文

I've been working on a Rabin-Karp string matching function in C++ and I'm not getting any results out of it. I have a feeling that I'm not computing some of the values correctly, but I don't know which one(s).

Prototype

void rabinKarp(string sequence, string pattern, int d, int q);

Function Implementation

void rabinKarp(string sequence, string pattern, int d, int q)
{
    //d is the |∑|
    //q is the prime number to use to lessen spurious hits
    int n = sequence.length(); //Length of the sequence
    int m = pattern.length(); //Length of the pattern
    double temp = static_cast<double> (m - 1.0);
    double temp2 = pow(static_cast<double> (d), temp); //Exponentiate d
    int h = (static_cast<int>(temp2)) % q; //High Order Position of an m-digit window
    int p = 0; //Pattern decimal value
    int t = 0; //Substring decimal value
    for (int i = 1; i < m; i++) { //Preprocessing
        p = (d*p + (static_cast<int>(pattern[i]) - 48)) % q;
        t = (d*t + (static_cast<int>(sequence[i])-48)) % q;
    }
    for (int s = 0; s < (n-m); s++) { //Matching(Iterate through all possible shifts)
        if (p == t) {
            for (int j = 0; j < m; j++) {
                if (pattern[j] == sequence[s+j]) {
                    cout << "Pattern occurs with shift: " << s << endl;
                }
            }
        }
        if (s < (n-m)) {
            t = (d*(t - ((static_cast<int>(sequence[s+1]) - 48)*h)) + (static_cast<int>(sequence[s + m + 1]) - 48)) % q;
        }
    }
    return;
}

In my function call I pass 2359023141526739921 as the sequence, 31415 as the pattern, 10 as the radix, and 13 as the prime. I expect there to be one actual match and one spurious hit, but I never get the output statement from the matching part of the function. What am I doing wrong?

Thanks in Advance, Madison

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

若水微香 2024-10-13 14:31:09

编码 Rabin Karp 的最大问题是模运算符。当两个数字 X 和 Y 模 Q 全等时，(X % Q) 应等于 (Y % Q)，但在您使用的 C++ 编译器上，仅当 X 和 Y 均为正数或均为负数时，它们才会相等。如果 X 为正，Y 为负，则 (X % Q) 为正，(Y % Q) 为负。事实上，在这种情况下， (X % Q)-Q == (Y % Q) 。

解决方法是在每个模数之后检查负值，以及是否有负值将 q 添加到变量中，因此您的预处理循环变为 :

    p = (d*p + pattern[i]) % q;
    if ( p < 0 ) p += q;
    t = (d*t + sequence[i]) % q;
    if ( t < 0 ) t += q;

t 主循环中需要添加类似的检查。

The big gotcha in coding the Rabin Karp is the modulo operator. When two numbers X and Y are congruent modulo Q then (X % Q) should equal (Y % Q) but on the C++ compiler you're using they will only be equal if X and Y are both positive or both negative. If X is positive and Y is negative then (X % Q) will be positive and (Y % Q) will negative. In fact (X % Q)-Q == (Y % Q) in this case.

The work around is to check for negative values after each modulo and if there are any to add q to the variable, so your preprocessing loop becomes :

    p = (d*p + pattern[i]) % q;
    if ( p < 0 ) p += q;
    t = (d*t + sequence[i]) % q;
    if ( t < 0 ) t += q;

t in the main loop needs to have a similar check added.

回复收藏 0 原文

榕城若虚 2024-10-13 14:31:09

除非您重新定义了 ^，否则它是计算异或，而不是求幂。另外，在执行 % 之前，您应该小心不要溢出 int 的最大值。

回复收藏 0 原文

~没有更多了~

关于作者

很快妥协

暂无简介

文章

26 人气

关注发私信

友情链接

文江博客

Rabin-Karp 字符串匹配不匹配

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

西西弗的石头怪

5397313

烟沫凡尘

一个破名字

萌︼了一个春

当爱已成负担

友情链接

Rabin-Karp 字符串匹配不匹配

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

西西弗的石头怪

5397313

烟沫凡尘

一个破名字

萌︼了一个春

当爱已成负担

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。