编辑:MySQL + PHP

发布于 2024-10-11 15:47:45 字数 381 浏览 8 评论 0原文

$word = strtolower($_GET['term']); 

$lev = 0;

$q = mysql_query("SELECT `term` FROM `words`"); 
while($r = mysql_fetch_assoc($q)) 
{ 
    $r['term'] = strtolower($r['term']); 

    $lev = levenshtein($word, $r['term']);

    if($lev >= 0 && $lev < 5)
    {
        $word = $r['term'];
    }
}

我怎样才能将所有这些转移到一个查询中?不想查询所有术语并在 PHP 中进行过滤。

$word = strtolower($_GET['term']); 

$lev = 0;

$q = mysql_query("SELECT `term` FROM `words`"); 
while($r = mysql_fetch_assoc($q)) 
{ 
    $r['term'] = strtolower($r['term']); 

    $lev = levenshtein($word, $r['term']);

    if($lev >= 0 && $lev < 5)
    {
        $word = $r['term'];
    }
}

How can I move all that into just one query? Don't want to have to query through all terms and do the filtering in PHP.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(8

因为看清所以看轻 2024-10-18 15:47:45

您需要 MySQL 中的 levenshtein 函数 并进行查询

$word = mysql_real_escape_string($word);
mysql_qery("SELECT `term` FROM `words` WHERE levenshtein('$word', `term`) BETWEEN 0 AND 4");

You need a levenshtein function in MySQL and query like

$word = mysql_real_escape_string($word);
mysql_qery("SELECT `term` FROM `words` WHERE levenshtein('$word', `term`) BETWEEN 0 AND 4");
心头的小情儿 2024-10-18 15:47:45

在 MySQL 中实现 Levenshtein 函数有两种方法。第一个是创建一个存储函数,其操作方式与存储事务非常相似,只是它具有不同的输入和输出。这对于小型数据集来说很好,但对于接近几千行的数据集来说有点慢。

CREATE FUNCTION levenshtein( s1 VARCHAR(255), s2 VARCHAR(255) )
RETURNS INT
DETERMINISTIC

BEGIN
DECLARE s1_len, s2_len, i, j, c, c_temp, cost INT;
DECLARE s1_char CHAR;
-- max strlen=255
DECLARE cv0, cv1 VARBINARY(256);
SET s1_len = CHAR_LENGTH(s1), s2_len = CHAR_LENGTH(s2), cv1 = 0x00, j = 1, i = 1, c = 0;
IF s1 = s2 THEN
  RETURN 0;
ELSEIF s1_len = 0 THEN
  RETURN s2_len;
ELSEIF s2_len = 0 THEN
  RETURN s1_len;
ELSE
  WHILE j <= s2_len DO
    SET cv1 = CONCAT(cv1, UNHEX(HEX(j))), j = j + 1;
  END WHILE;
  WHILE i <= s1_len DO
    SET s1_char = SUBSTRING(s1, i, 1), c = i, cv0 = UNHEX(HEX(i)), j = 1;
    WHILE j <= s2_len DO
    SET c = c + 1;
    IF s1_char = SUBSTRING(s2, j, 1) THEN
      SET cost = 0; ELSE SET cost = 1;
    END IF;
    SET c_temp = CONV(HEX(SUBSTRING(cv1, j, 1)), 16, 10) + cost;
    IF c > c_temp THEN SET c = c_temp; END IF;
      SET c_temp = CONV(HEX(SUBSTRING(cv1, j+1, 1)), 16, 10) + 1;
      IF c > c_temp THEN
        SET c = c_temp;
      END IF;
      SET cv0 = CONCAT(cv0, UNHEX(HEX(c))), j = j + 1;
    END WHILE;
    SET cv1 = cv0, i = i + 1;
  END WHILE;
END IF;

RETURN c;

END//

将上述代码存储在 .sql 文件中并将其导入数据库,如下所示:

source /tmp/mysql_udf.sql

第二种方法是用 C/C++ 实现用户定义函数并将其作为共享库(*.so 文件)链接到 MySQL。此方法还使用存储函数来调用库,这意味着此方法或第一个方法的实际查询可能相同(假设两个函数的输入相同)。您可以在此处找到有关此方法的更多信息: http://samjlevy.com /mysql-levenshtein-and-damerau-levenshtein-udfs/

使用这两种方法中的任何一种,您的查询将类似于:

SELECT term FROM words WHERE levenshtein(term, 'term') < 5;

另外,请记住,“阈值”值应根据原始单词长度而变化。最好用百分比值来考虑,即一半单词 = 50%,一半“术语”= 2。

There are two ways to implement a Levenshtein function in MySQL. The first is to create a STORED FUNCTION which operates much like a STORED TRANSACTION, except it has distinct inputs and an output. This is fine for small datasets, but a little slow on anything approaching several thousand rows.

CREATE FUNCTION levenshtein( s1 VARCHAR(255), s2 VARCHAR(255) )
RETURNS INT
DETERMINISTIC

BEGIN
DECLARE s1_len, s2_len, i, j, c, c_temp, cost INT;
DECLARE s1_char CHAR;
-- max strlen=255
DECLARE cv0, cv1 VARBINARY(256);
SET s1_len = CHAR_LENGTH(s1), s2_len = CHAR_LENGTH(s2), cv1 = 0x00, j = 1, i = 1, c = 0;
IF s1 = s2 THEN
  RETURN 0;
ELSEIF s1_len = 0 THEN
  RETURN s2_len;
ELSEIF s2_len = 0 THEN
  RETURN s1_len;
ELSE
  WHILE j <= s2_len DO
    SET cv1 = CONCAT(cv1, UNHEX(HEX(j))), j = j + 1;
  END WHILE;
  WHILE i <= s1_len DO
    SET s1_char = SUBSTRING(s1, i, 1), c = i, cv0 = UNHEX(HEX(i)), j = 1;
    WHILE j <= s2_len DO
    SET c = c + 1;
    IF s1_char = SUBSTRING(s2, j, 1) THEN
      SET cost = 0; ELSE SET cost = 1;
    END IF;
    SET c_temp = CONV(HEX(SUBSTRING(cv1, j, 1)), 16, 10) + cost;
    IF c > c_temp THEN SET c = c_temp; END IF;
      SET c_temp = CONV(HEX(SUBSTRING(cv1, j+1, 1)), 16, 10) + 1;
      IF c > c_temp THEN
        SET c = c_temp;
      END IF;
      SET cv0 = CONCAT(cv0, UNHEX(HEX(c))), j = j + 1;
    END WHILE;
    SET cv1 = cv0, i = i + 1;
  END WHILE;
END IF;

RETURN c;

END//

Store the above code in a .sql file and import it into your database like so:

source /tmp/mysql_udf.sql

The second method is to implement a User Defined Function in C/C++ and link it into MySQL as a shared library (*.so file). This method also uses a STORED FUNCTION to call the library, which means the actual query for this or the first method may be identical (providing the inputs to both functions are the same). You can find out more about this method here: http://samjlevy.com/mysql-levenshtein-and-damerau-levenshtein-udfs/

With either of these methods, your query would be something like:

SELECT term FROM words WHERE levenshtein(term, 'term') < 5;

Also, remember that the 'threshold' value should change in relation to the original word length. It's better to think of it in terms of a percentage value, i.e. half your word = 50%, half of 'term' = 2.

街道布景 2024-10-18 15:47:45

如果你有一个巨大的数据库,你可以首先使用 SOUNDEX 过滤单词:

$word = strtolower(mysql_real_escape_string($_GET['term']));

$rs = mysql_query("SELECT LOWER(`term`) FROM `words` WHERE SOUNDEX(term) = SOUNDEX(" . $word . ")");

while ($row = mysql_fetch_assoc($rs)) { 

    $lev = levenshtein($word, $row['term']);

    ....

}

如果你有足够的时间使用 C 扩展或过程,你可能会获得更好的性能,但是在应用真正的 levenshtein 之前过滤 mysql 上的记录会让事情变得更快几乎不费吹灰之力。

If you have a huge database, you can filter the words first using SOUNDEX:

$word = strtolower(mysql_real_escape_string($_GET['term']));

$rs = mysql_query("SELECT LOWER(`term`) FROM `words` WHERE SOUNDEX(term) = SOUNDEX(" . $word . ")");

while ($row = mysql_fetch_assoc($rs)) { 

    $lev = levenshtein($word, $row['term']);

    ....

}

If you have time enough to play with a C extension or procedure, you may achieve better performance, but filtering the records on mysql before applying real levenshtein will make things faster with almost no effort.

久伴你 2024-10-18 15:47:45

如果您正在处理非常大的数据集,我发现在 PHP 中处理 Levenshtein 操作和排序比在 MySQL 中更有效。
例如查询约1000条记录:

MySQL(~0.0050s) -> PHP Levenshtein(~ 1.300s)

MySQL Levenshtein(>= 5.000s)-> PHP(~ 0.250s)

还有许多其他选项可用于优化搜索引擎,但如果您想使用 Levenshtein,只需注意您将处理的数据和您想要的延迟。

If you are dealing with very large data sets I have found that it is much more efficient to handle the Levenshtein operations and sorting in PHP than it is in MySQL.
e.g. query of about 1000 records:

MySQL( ~ 0.0050s) -> PHP Levenshtein( ~ 1.300s)

vs.

MySQL Levenshtein( >= 5.000s) -> PHP( ~ 0.250s)

There are also many other options for optimizing search engines but if you want to use Levenshtein just be aware of the data you'll be handling and the latencies you want.

网白 2024-10-18 15:47:45

您可以使这段代码看起来更整洁,但@profitphp是对的,如果没有levenstein库,您就无法在MySQL中执行此操作。

$word = strtolower($_GET['term']);

$q = mysql_uqery("SELECT LOWER(`term`) FROM `words`");

while($r = mysql_fetch_assoc($q)) { 

    $lev = levenshtein($word, $r['term']);

    ....

}

You can make this code look a bit neater but @profitphp is right, you can't doing it in MySQL without a levenstein library.

$word = strtolower($_GET['term']);

$q = mysql_uqery("SELECT LOWER(`term`) FROM `words`");

while($r = mysql_fetch_assoc($q)) { 

    $lev = levenshtein($word, $r['term']);

    ....

}
只涨不跌 2024-10-18 15:47:45

我建议您包括 levenshtein 的调用(链接:http://www.artfulsoftware.com /infotree/queries.php#552)
进入你的查询。

您应该使用 mysqli_query($q) 因为 mysql_query($q) 已被弃用并且
可能会在未来的 php 版本中删除!

$word = mysql_real_escape_string($word);
$query = "SELECT `term` FROM `words` WHERE levenshtein('$word', `term`)   BETWEEN 0 AND 4";
mysqli_qery($query);

I suggest you including the call of the levenshtein(link: http://www.artfulsoftware.com/infotree/queries.php#552)
into your query.

You should use mysqli_query($q) because mysql_query($q) is deprecated and
may be removed in future php versions!

$word = mysql_real_escape_string($word);
$query = "SELECT `term` FROM `words` WHERE levenshtein('$word', `term`)   BETWEEN 0 AND 4";
mysqli_qery($query);
内心旳酸楚 2024-10-18 15:47:45

我在 Oracle 中通过在可调用的函数内实现 PL/SQL 中的算法来实现此目的。

I do this in Oracle by implementing the algorithm in PL/SQL inside a function that can be called.

岁吢 2024-10-18 15:47:45

这是一个查询。如果您询问是否可以将 levenshtein 功能移至 mysql,那么您不能。

好吧,你可以,但这并不比在 php 中完成更容易。

http://www.artfulsoftware.com/infotree/queries.php ?&bw=1280#552

That is one query. If you're asking if you can move the levenshtein functionality to mysql, you can't.

Ok, well you can, but its not any easier than just doing it in php.

http://www.artfulsoftware.com/infotree/queries.php?&bw=1280#552

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文