PHP 中的 Rabin-Karp 算法

发布于 2024-08-03 21:58:34 字数 49 浏览 5 评论 0原文

我想知道是否有人可以分享 Rabin-Karp 算法的源代码?

谢谢

I was wondering if anyone can share a source for Rabin-Karp algorithm?

Thanks

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

浮华 2024-08-10 21:58:34

这是Karp-Rabin 算法的 C 实现的端口< /a>:

function KR($haystack, $needle) {
    $n = strlen($haystack);
    $m = strlen($needle);
    if ($m > $n) {
        return -1;
    }
    /* Preprocessing */
    $d = 1 << ($m - 1);
    for ($hh = $hn = $i = 0; $i < $m; ++$i) {
        $hh = (($hh<<1) + ord($haystack[$i]));
        $hn = (($hn<<1) + ord($needle[$i]));
    }
    /* Searching */
    $j = 0;
    while ($j <= $n-$m) {
        if ($hh == $hn && substr($haystack, $j, $m) === $needle) {
            return $j;
        }
        if ($j === $n-$m) {
            return false;
        }
        /* Rehashing */
        $hh = (($hh - ord($haystack[$j]) * $d) << 1) + ord($haystack[$j + $m]);
        ++$j;
    }
    return false;
}

This is a port of this C implementation of the Karp-Rabin algorithm:

function KR($haystack, $needle) {
    $n = strlen($haystack);
    $m = strlen($needle);
    if ($m > $n) {
        return -1;
    }
    /* Preprocessing */
    $d = 1 << ($m - 1);
    for ($hh = $hn = $i = 0; $i < $m; ++$i) {
        $hh = (($hh<<1) + ord($haystack[$i]));
        $hn = (($hn<<1) + ord($needle[$i]));
    }
    /* Searching */
    $j = 0;
    while ($j <= $n-$m) {
        if ($hh == $hn && substr($haystack, $j, $m) === $needle) {
            return $j;
        }
        if ($j === $n-$m) {
            return false;
        }
        /* Rehashing */
        $hh = (($hh - ord($haystack[$j]) * $d) << 1) + ord($haystack[$j + $m]);
        ++$j;
    }
    return false;
}
著墨染雨君画夕 2024-08-10 21:58:34

为了便于说明,这是上面 Gumbo 的答案的一个稍微修改的版本,具有更简单的散列和更清晰的变量命名。

在下面的说明性散列中,我只是将每个字符的 ord() 值添加到一个代表散列的数字中,然后在推进搜索时减去该值/添加下一个字符的 ord() 。 这非常容易发生冲突(因此不利于生产),但如果您只是从概念上学习 Rabin-Karp,则更容易理解。

function rk ($needle, $haystack)
{
    $nlen = strlen($needle);
    $hlen = strlen($haystack);
    $nhash = 0;
    $hhash = 0;

    // Special cases that don't require the rk algo:
    // if needle is longer than haystack, no possible match
    if ($nlen > $hlen) {
        return false;
    }
    // If they're the same size, they must just match
    if ($nlen == $hlen) {
        return ($needle === $haystack);
    }

    // Compute hash of $needle and $haystack[0..needle.length]
    // This is a very primitive hashing method for illustrative purposes
    // only. You'll want to modify each value based on its position in
    // the string as per Gumbo's example above (left shifting)
    for ($i = 0; $i < $nlen; ++$i) {
        $nhash += ord($needle[$i]);
        $hhash += ord($haystack[$i]);
    }

    // Go through each position of needle and see if
    // the hashes match, then do a comparison at that point
    for ($i = 0, $c = $hlen - $nlen; $i <= $c; ++$i) {
        // If the hashes match, there's a good chance the next $nlen characters of $haystack matches $needle
        if ($nhash == $hhash && $needle === substr($haystack, $i, $nlen)) {
            return $i;
        }
        // If we've reached the end, don't try to update the hash with
        // the code following this if()
        if ($i == $c) {
            return false;
        }

        // Update hhash to the next position by subtracting the
        // letter we're removing and adding the letter we're adding
        $hhash = ($hhash - ord($haystack[$i])) + ord($haystack[$i + $nlen]);
    }

    return false;
}

Here's a slightly altered version Gumbo's answer above, with simpler hashing and clearer variable naming, for the sake of illustration.

In the illustrative hashing below I'm just adding the ord() value of each character to a number, which represents the hash, then subtracting that value/adding the ord() of the next char when advancing our search. This is very collision-prone (and therefore not good for production), but it's easier to understand if you're just learning Rabin-Karp conceptually.

function rk ($needle, $haystack)
{
    $nlen = strlen($needle);
    $hlen = strlen($haystack);
    $nhash = 0;
    $hhash = 0;

    // Special cases that don't require the rk algo:
    // if needle is longer than haystack, no possible match
    if ($nlen > $hlen) {
        return false;
    }
    // If they're the same size, they must just match
    if ($nlen == $hlen) {
        return ($needle === $haystack);
    }

    // Compute hash of $needle and $haystack[0..needle.length]
    // This is a very primitive hashing method for illustrative purposes
    // only. You'll want to modify each value based on its position in
    // the string as per Gumbo's example above (left shifting)
    for ($i = 0; $i < $nlen; ++$i) {
        $nhash += ord($needle[$i]);
        $hhash += ord($haystack[$i]);
    }

    // Go through each position of needle and see if
    // the hashes match, then do a comparison at that point
    for ($i = 0, $c = $hlen - $nlen; $i <= $c; ++$i) {
        // If the hashes match, there's a good chance the next $nlen characters of $haystack matches $needle
        if ($nhash == $hhash && $needle === substr($haystack, $i, $nlen)) {
            return $i;
        }
        // If we've reached the end, don't try to update the hash with
        // the code following this if()
        if ($i == $c) {
            return false;
        }

        // Update hhash to the next position by subtracting the
        // letter we're removing and adding the letter we're adding
        $hhash = ($hhash - ord($haystack[$i])) + ord($haystack[$i + $nlen]);
    }

    return false;
}
心凉 2024-08-10 21:58:34

试试这个。在将其发送到 match_rabinKarp() 之前,您必须从 $needle$haystack 中删除标点符号,但这基本上遵循给定的算法在维基百科页面上。

// this hash function is friendly, according to the wikipedia page
function hash($string) {
 $base = ord('a');
 if (strlen($string) == 1) {
  return ord($string);
 } else {
  $result = 0;
  // sum each of the character*(base^i)
  for ($i=strlen($string)-1; $i>=0; $i++) {
   $result += $string[$i]*pow($base,$i);
  }
  return $result;
 }
}
// perform the actual match
function match_rabinKarp($needle, $haystack) {
 $needle = substr($needle);      // remove capitals
 $haystack = substr($haystack);  // remove capitals
 $m = strlen($needle);           // length of $needle
 $n = strlen($haystack);         // length of $haystack
 $h_haystack = hash($haystack);  // hash of $haystack
 $h_needle = hash($needle);      // hash of $needle
 // whittle away at the $haystack until we find a match
 for ($i=0;$i<$n-$m+1;$i++) {
  if ($h_needle == $h_haystack) {
   if (substr($haystack,$i,$i+$m-1) == $needle) {
    return $i;
   }
  }
 }
 return false;
}

Try this out. You will have to strip the punctuation from the $needle and $haystack before sending it to match_rabinKarp(), but this basically follows the algorithm given on the wikipedia page.

// this hash function is friendly, according to the wikipedia page
function hash($string) {
 $base = ord('a');
 if (strlen($string) == 1) {
  return ord($string);
 } else {
  $result = 0;
  // sum each of the character*(base^i)
  for ($i=strlen($string)-1; $i>=0; $i++) {
   $result += $string[$i]*pow($base,$i);
  }
  return $result;
 }
}
// perform the actual match
function match_rabinKarp($needle, $haystack) {
 $needle = substr($needle);      // remove capitals
 $haystack = substr($haystack);  // remove capitals
 $m = strlen($needle);           // length of $needle
 $n = strlen($haystack);         // length of $haystack
 $h_haystack = hash($haystack);  // hash of $haystack
 $h_needle = hash($needle);      // hash of $needle
 // whittle away at the $haystack until we find a match
 for ($i=0;$i<$n-$m+1;$i++) {
  if ($h_needle == $h_haystack) {
   if (substr($haystack,$i,$i+$m-1) == $needle) {
    return $i;
   }
  }
 }
 return false;
}
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文