支持模糊搜索的最容易实现的站点搜索应用程序是什么?

发布于 2024-08-14 14:08:53 字数 221 浏览 10 评论 0原文

我有一个网站需要搜索大约 20-30k 条记录,其中大部分是电影和电视节目名称。该站点使用 memcache 运行 php/mysql。

我希望用我当前拥有的 soundex() 搜索替换 FULLTEXT ,这可以工作......有点,但在很多情况下都不是很好。

是否有任何像样的搜索脚本易于实现,并且将提供像样的搜索功能(表中的 3 列)。

I have a site that needs to search thru about 20-30k records, which are mostly movie and TV show names. The site runs php/mysql with memcache.

Im looking to replace the FULLTEXT with soundex() searching that I currently have, which works... kind of, but isn't very good in many situations.

Are there any decent search scripts out there that are simple to implement, and will provide a decent searching capability (of 3 columns in a table).

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

冬天的雪花 2024-08-21 14:08:53

ewemli 的答案是正确的方向,但您应该结合 FULLTEXT 和 soundex 映射,而不是替换全文,否则您的 LIKE 查询可能会非常慢。

create table with_soundex (
  id INT UNSIGNED AUTO_INCREMENT NOT NULL PRIMARY KEY,
  original TEXT,
  soundex TEXT,
  FULLTEXT (soundex)
);

insert into with_soundex (original, soundex) values 

('add some test cases', CONCAT_WS(' ', soundex('add'), soundex('some'), soundex('test'), soundex('cases'))),
('this is some text', CONCAT_WS(' ', soundex('this'), soundex('is'), soundex('some'), soundex('text'))),
('one more test case', CONCAT_WS(' ', soundex('one'), soundex('more'), soundex('test'), soundex('case'))),
('just filling the index', CONCAT_WS(' ', soundex('just'), soundex('filling'), soundex('the'), soundex('index'))),
('need one more example', CONCAT_WS(' ', soundex('need'), soundex('one'), soundex('more'), soundex('example'))),
('seems to need more', CONCAT_WS(' ', soundex('seems'), soundex('to'), soundex('need'), soundex('more')))
('some helpful cases to consider', CONCAT_WS(' ', soundex('some'), soundex('helpful'), soundex('cases'), soundex('to'), soundex('consider')))

select * from with_soundex where match(soundex) against (soundex('test'));
+----+---------------------+---------------------+
| id | original            | soundex             |
+----+---------------------+---------------------+
|  1 | add some test cases | A300 S500 T230 C000 | 
|  2 | this is some text   | T200 I200 S500 T230 | 
|  3 | one more test case  | O500 M600 T230 C000 | 
+----+---------------------+---------------------+

select * from with_soundex where match(soundex) against (CONCAT_WS(' ', soundex('test'), soundex('some')));
+----+--------------------------------+---------------------------+
| id | original                       | soundex                   |
+----+--------------------------------+---------------------------+
|  1 | add some test cases            | A300 S500 T230 C000       | 
|  2 | this is some text              | T200 I200 S500 T230       | 
|  3 | one more test case             | O500 M600 T230 C000       | 
|  7 | some helpful cases to consider | S500 H414 C000 T000 C5236 | 
+----+--------------------------------+---------------------------+

这提供了相当好的结果(在 soundex 算法的限制内),同时最大限度地利用索引(任何像 '%foo' 这样的查询都必须扫描表中的每一行)。

请注意对每个单词而不是整个短语运行 soundex 的重要性。您还可以对每个单词运行您自己的 soundex 版本,而不是让 SQL 执行此操作,但在这种情况下,请确保在存储和检索时都执行此操作,以防算法之间存在差异(例如,MySQL 的算法不限制本身到标准4个字符

ewemli's answer is in the right direction but you should be combining FULLTEXT and soundex mapping, not replacing the fulltext, otherwise your LIKE queries are likely be very slow.

create table with_soundex (
  id INT UNSIGNED AUTO_INCREMENT NOT NULL PRIMARY KEY,
  original TEXT,
  soundex TEXT,
  FULLTEXT (soundex)
);

insert into with_soundex (original, soundex) values 

('add some test cases', CONCAT_WS(' ', soundex('add'), soundex('some'), soundex('test'), soundex('cases'))),
('this is some text', CONCAT_WS(' ', soundex('this'), soundex('is'), soundex('some'), soundex('text'))),
('one more test case', CONCAT_WS(' ', soundex('one'), soundex('more'), soundex('test'), soundex('case'))),
('just filling the index', CONCAT_WS(' ', soundex('just'), soundex('filling'), soundex('the'), soundex('index'))),
('need one more example', CONCAT_WS(' ', soundex('need'), soundex('one'), soundex('more'), soundex('example'))),
('seems to need more', CONCAT_WS(' ', soundex('seems'), soundex('to'), soundex('need'), soundex('more')))
('some helpful cases to consider', CONCAT_WS(' ', soundex('some'), soundex('helpful'), soundex('cases'), soundex('to'), soundex('consider')))

select * from with_soundex where match(soundex) against (soundex('test'));
+----+---------------------+---------------------+
| id | original            | soundex             |
+----+---------------------+---------------------+
|  1 | add some test cases | A300 S500 T230 C000 | 
|  2 | this is some text   | T200 I200 S500 T230 | 
|  3 | one more test case  | O500 M600 T230 C000 | 
+----+---------------------+---------------------+

select * from with_soundex where match(soundex) against (CONCAT_WS(' ', soundex('test'), soundex('some')));
+----+--------------------------------+---------------------------+
| id | original                       | soundex                   |
+----+--------------------------------+---------------------------+
|  1 | add some test cases            | A300 S500 T230 C000       | 
|  2 | this is some text              | T200 I200 S500 T230       | 
|  3 | one more test case             | O500 M600 T230 C000       | 
|  7 | some helpful cases to consider | S500 H414 C000 T000 C5236 | 
+----+--------------------------------+---------------------------+

That gives pretty good results (within the limits of the soundex algo) while taking maximum advantage of an index (any query LIKE '%foo' has to scan every row in the table).

Note the importance of running soundex on each word, not on the entire phrase. You could also run your own version of soundex on each word rather than having SQL do it but in that case make sure you do it both when storing and retrieving in case there are differences between the algorithms (for instance, MySQL's algo doesn't limit itself to the standard 4 chars)

2024-08-21 14:08:53

如果您正在寻找简单的现有解决方案而不是创建自己的解决方案,请查看

If you are looking for a simple existing solution instead of creating your own solution check out

孤独患者 2024-08-21 14:08:53

mysql中有一个函数SOUNDEX。如果你想搜索电影标题:

select * from movie where soundex(title) = soundex( 'the title' );

当然,在文本中搜索是不行的,比如电影或剧情摘要。


Soundex 是一个相对简单的算法。您还可以决定在应用级别处理所有这些,这可能会更容易:

  • 存储文本时,对其进行标记并对所有单词应用 soundex
  • 将原始文本和 soundex 版本存储在两列中,
  • 当您搜索时,计算 soundex该应用程序。 level,然后在数据库级别使用常规的LIKE

There is a function SOUNDEX in mysql. If you want to search for a movie title :

select * from movie where soundex(title) = soundex( 'the title' );

Of course it doesn't work to search in text, such as movie or plot summary.


Soundex is a relatively simple algo. You can also decide to handle all that at the applicative level, it may be easier:

  • when text is stored, tokenize it and apply soundex on all words
  • store the original text and the soundex version in two columns
  • when you search, compute the soundex at the app. level and then use a regular LIKE at the db level.
热鲨 2024-08-21 14:08:53

Soundex 在处理模糊搜索方面存在局限性。一个更好的功能是编辑距离,可以使用UDF集成到MySQL中。检查 http://flamingo.ics.uci.edu/toolkit/ 的 C++ 实现Linux 上的 MySQL。

Soundex has limitations to deal with fuzzy search. A better function is edit distance, which can be integrated into MySQL using UDF. Check http://flamingo.ics.uci.edu/toolkit/ for a C++ implementation for MySQL on Linux.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文