如何“重新定义搜索”或纠正“拼写错误”从数据库中
我想在我的网站搜索中添加新功能。我正在使用 PHP 和 MySQL。 MySQL 数据库包含用户将搜索的项目的表,对于每个项目都有一个“关键字”列,其中是逗号分隔的关键字“示例:猫、狗、马”。用户在我的网站搜索后,我想要得到与他的搜索关键字相似的词,让我说“85%”,这是为了重新定义搜索。对于拼写错误,我想要一个服务或提供关键字是否正确或拼写错误的服务,以便我得到一些更正并检查数据库中是否存在这些更正,然后将这些更正提供给用户以更改他的搜索关键字。
我并不是在这里寻求解决方案......但如果您能以一种或另一种方式指导我,那就太好了。
I want to add new feature to the search in my website. I'm using PHP and MySQL.
MySQL database containing a table to the items that the user will search for, for each item there is a "keyword" column that's comma separated keywords "EXAMPLE: cat,dog,horse". After the user search in my website I want to get the words that are let me say "85%" similar to his search keyword, this is for redefine search. And for misspelling I want a service or something that provide if the keyword is correct or misspelled so I get some corrections and check if those exists in the database and then give those corrections to user to change his search keyword.
I'm not asking for a solution here ... but if you can direct me in a one way or another that will be great.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
关键在于你的“85%相似”的想法。以下是一些想法:
相似词表
您可以定义一个表,在其中列出关键字的常见拼写错误。然后,您必须增强搜索数据库的方式,将常见的拼写错误映射到正确的值。
相似词查找
执行搜索时,使用库生成相似词并搜索所有这些词。在发送搜索之前,您可以使用任何类型的拼写库来生成可能的单词匹配。或者根据编辑距离算法编写自己的算法。
仅在需要时检查:
由于您使用的是 PHP,因此您可以考虑 pspell。您可以先调用
pspell_check
来查看单词拼写是否正确。然后调用pspell_suggest
获取建议。请参阅此链接以获取示例。
使用数据库功能
MySQL ,例如有一个
SOUNDS_LIKE
运算符。您可以搜索WHERE关键字SOUNDS_LIKE 'kat'
并(大概)得到cat
。 更多信息位于文档页面,它会提醒您一些限制(例如仅限英语和 UTF-8)。这听起来像是一个相当常见的问题,所以也许这个问题还有其他更规范的解决方案。也许您正在使用的语言(或数据库接口层)有一些特定的东西可以为您抽象这一点。
前两个应该可以让您满足 85% 相似度的概念。我不知道第三种选择效果如何,但它“听起来很酷”。
The key is in your idea of "85% similar". Here are some ideas:
Similar Words Table
You can define a table where you list common misspellings for your keywords. You'll then have to augment how you search the database to map common misspellings to the proper value.
Similar Words Lookup
When you perform the search, use a library to generate similar words and search for all of them. You can use any sort of spelling library to generate possible word matches before sending the search. Or write your own based on the Edit Distance algorithm.
Only check if needed:
Since you're using PHP, you may consider pspell. You can first call
pspell_check
to see if the word is spelled correctly. Then callpspell_suggest
to get suggestions.See this link for an example.
Use a Database Feature
MySQL, for example has a
SOUNDS_LIKE
operator. You can search forWHERE keyword SOUNDS_LIKE 'kat'
and (presumably) getcat
. More info is on the documentation page, which alerts you to some limitations (like English and UTF-8 only).It sounds like a fairly common problem, so perhaps there are other more canonical solutions to this problem. Perhaps there's something specific to the language you're using (or in the database interface layer) that can abstract this for you.
The first two should allow you to meet some notion of 85% similarity. I have no idea how well the third option will work, but it "soundz kool."
PHP中有similar_text(),但那是在查询之后;您还可以查看 MySQL 中的全文搜索。
There's similar_text() in PHP, but that's after the query; you could also check out Full-Text search in MySQL.
尝试研究编辑距离算法。基本上,对于两个输入字符串,返回值是将一个字符串转换为另一个字符串所需的最小编辑次数。这可以让您了解两个字符串的接近程度。
编辑距离
Try looking into the Edit Distance Algorithm. Basically for two inputs strings, the return value is the minimum number of edits needed to transform one string into the other. That can give you some idea about how close two strings are.
Edit Distance
Apache Solr 是一个开源搜索平台,不仅提供全文搜索功能,还提供内置匹配评分和自动建议系统等强大的功能。
如果您网站中的信息量不够重要,那么此选项可能听起来不合适,但我建议至少检查一下。
您的应用程序和 Solr 之间的通信可以通过标准 REST 接口进行处理。据我所知,目前有两个很好的 Solr 特定 PHP 库可用:
设置服务器非常简单,是调整和优化 Solr 以最好地满足您的需求的费力部分(也是有趣的部分)。
Apache Solr is an open source search platform that provides not only with full-text search capabilities but also with built-in matching score and auto-suggestion systems, among many other powerful features.
If the amount of information in your site is not significant enough, this option may sound undue, although I'd recommend to at least check it out.
The communication between your app and Solr can be handled through a standard REST interface. AFAIK there are two good Solr-specific PHP libraries available at the moment:
Setting up the server is pretty straight forward, being the laborious part (as well as the interesting one) that of tuning and optimizing Solr to best fit your needs.