使用 PHP 和数据库实现 Levenshtein

发布于 2024-12-20 22:15:11 字数 1499 浏览 2 评论 0原文

我有一个搜索表格。如果用户输入诸如 ager 而不是 anger 之类的拼写错误，它仍应显示相关结果，而不是显示 0 个找到的结果。

我遇到了 PHP levenshtein 函数他们给出的数组示例是正是我想要的[除了用户可以输入一个句子而不是一个单词]，但我想用数据库来实现它，但不知道如何用数据库来实现它。

这是我的代码：

if(!empty($search))
{
    try {
        $query = $this->_db->prepare($sql);
        $query->execute();
        if(!$query->rowCount()==0)
        {
            $foundRows = $this->_db->query("SELECT FOUND_ROWS()")->fetchColumn();
            while($row = $query->fetch(PDO::FETCH_ASSOC))
            {
                $cQuote =  $this->highlightWords(htmlspecialchars($row['cQuotes']),$search);
                $search_result[] = array('success' => true, 'totalRows' => $foundRows, 'cQuotes' => $cQuote, 'vAuthor' => $this->h($row['vAuthor']), 'vBookName' => $this->h($row['vBookName']), 'vRef' => $this->h($row['vRef']));
            }
            $response = json_encode($search_result);
            echo $response;
            return TRUE;
        }
        else
        {
            $ex =  "No results found for " .$search;
            $this->errorMsg($ex);
        }
        $query->closeCursor();
    }
    catch (Exception $ex){
        $ex =  "Problem: " .$ex;
        $this->errorMsg($ex);
    }
}
else
{
    $ex =  "Please enter something";
    $this->errorMsg($ex);
}

我应该补充一点，我正在使用 MySQL + PDO。

原文

I have a search form. If the user makes a typo like ager instead of anger, it should still show the relevant results instead of displaying 0 results found.

I came across the PHP levenshtein function and the example that they have given with array is exactly what I want [except that the user can input a sentence rather than one word], but I would like to implement it with database, but have no idea as to how go about implementing it with database.

This is my code:

if(!empty($search))
{
    try {
        $query = $this->_db->prepare($sql);
        $query->execute();
        if(!$query->rowCount()==0)
        {
            $foundRows = $this->_db->query("SELECT FOUND_ROWS()")->fetchColumn();
            while($row = $query->fetch(PDO::FETCH_ASSOC))
            {
                $cQuote =  $this->highlightWords(htmlspecialchars($row['cQuotes']),$search);
                $search_result[] = array('success' => true, 'totalRows' => $foundRows, 'cQuotes' => $cQuote, 'vAuthor' => $this->h($row['vAuthor']), 'vBookName' => $this->h($row['vBookName']), 'vRef' => $this->h($row['vRef']));
            }
            $response = json_encode($search_result);
            echo $response;
            return TRUE;
        }
        else
        {
            $ex =  "No results found for " .$search;
            $this->errorMsg($ex);
        }
        $query->closeCursor();
    }
    catch (Exception $ex){
        $ex =  "Problem: " .$ex;
        $this->errorMsg($ex);
    }
}
else
{
    $ex =  "Please enter something";
    $this->errorMsg($ex);
}

I should add that I'm using MySQL + PDO.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

虫児飞 2024-12-27 22:15:11

为此，您需要三件事：

MySQL 端的 Levenshtein-distance 实现
找到的单词索引在每个文本中
将搜索查询拆分为单词，每个单词有一个 LEFT JOIN 和一个 HAVING 子句

示例数据库架构：

text

+---------+----------------------------------------------+
| text_id | text                                         |
+---------+----------------------------------------------+
|       1 | The quick brown fox jumps over the lazy dog  |
|       2 | The slow brown foxes jump over the lazy dogs |
+---------+----------------------------------------------+

词

+-------+---------+
| word  | text_id |
+-------+---------+
| fox   |       1 |
| foxes |       2 |
| dog   |       1 |
| dogs  |       2 |
+-------+---------+

一旦你有了这个，假设有人搜索“foxs dogg”，您将构建一个如下查询：

SELECT text FROM text
    LEFT JOIN word w1 ON w1.text_id = text.text_id AND LEVENSHTEIN(w1.word, "foxs") < 3
    LEFT JOIN word w2 ON w2.text_id = text.text_id AND LEVENSHTEIN(w2.word, "dogg") < 3
    GROUP BY text.text_id
    HAVING COUNT(*) = 2

...其中：

每个单词都有一个LEFT JOIN（例如：Foxs 和 dogg）
您有一个包含单词总数的 HAVING 子句（例如：HAVING COUNT(*) = 2)
指定每个单词的最大距离（例如：LEVENSHTEIN(...) < 3）

以上将返回两个条目。

这是另一个示例：

SELECT text FROM text
    LEFT JOIN word w1 ON w1.text_id = text.text_id AND LEVENSHTEIN(w1.word, "foxs") < 3
    LEFT JOIN word w2 ON w2.text_id = text.text_id AND LEVENSHTEIN(w2.word, "slows") < 3
    GROUP BY text.text_id
    HAVING COUNT(*) = 2

上面的代码仅返回 text_id = 2。

现在，在疯狂地实现此操作之前，您应该知道，在具有数百万个条目（单词）的表上，像上面这样的多个 JOIN 子句将产生非常大的性能影响。

虽然这是一个有效的示例，但您确实应该寻找已经实现的搜索算法，例如 Solr 的 SpellCheck< /a> 组件。

For this to work, you'd need three things:

A Levenshtein-distance implementation on the MySQL end
An index of words found in each text
Split the search query into words, a LEFT JOIN for each word and an HAVING clause

An example database schema:

text

+---------+----------------------------------------------+
| text_id | text                                         |
+---------+----------------------------------------------+
|       1 | The quick brown fox jumps over the lazy dog  |
|       2 | The slow brown foxes jump over the lazy dogs |
+---------+----------------------------------------------+

word

+-------+---------+
| word  | text_id |
+-------+---------+
| fox   |       1 |
| foxes |       2 |
| dog   |       1 |
| dogs  |       2 |
+-------+---------+

Once you have that, say someone searches for "foxs dogg", you'd build a query like this one:

SELECT text FROM text
    LEFT JOIN word w1 ON w1.text_id = text.text_id AND LEVENSHTEIN(w1.word, "foxs") < 3
    LEFT JOIN word w2 ON w2.text_id = text.text_id AND LEVENSHTEIN(w2.word, "dogg") < 3
    GROUP BY text.text_id
    HAVING COUNT(*) = 2

...where:

Each word has a LEFT JOIN (e.g.: foxs and dogg)
You have an HAVING clause that contains the total number of words (e.g.: HAVING COUNT(*) = 2)
The maximum distance for each word is specified (e.g.: LEVENSHTEIN(...) < 3)

The above would return both entries.

Here's another example:

SELECT text FROM text
    LEFT JOIN word w1 ON w1.text_id = text.text_id AND LEVENSHTEIN(w1.word, "foxs") < 3
    LEFT JOIN word w2 ON w2.text_id = text.text_id AND LEVENSHTEIN(w2.word, "slows") < 3
    GROUP BY text.text_id
    HAVING COUNT(*) = 2

The above would return only text_id = 2.

Now, before you go crazy implementing this, you should know that multiple JOIN clauses, like the above, on a table having millions of entries (words), will have a very big performance impact.

While this is a working example, you really should look for an already implemented search algorithm, like Solr's SpellCheck component.

回复收藏 0 原文

~没有更多了~