SQL - 使用 LIKE 运算符搜索数据库
鉴于您的数据存储在数据库中的某个位置:
Hello my name is Tom I like dinosaurs to talk about SQL.
SQL is amazing. I really like SQL.
我们希望实现站点搜索,允许访问者输入术语并返回相关记录。用户可能会搜索:
Dinosaurs
并且 SQL:
WHERE articleBody LIKE '%Dinosaurs%'
可以很好地返回正确的记录集。
然而,如果用户拼错了恐龙,我们该如何应对? IE:(
Dinosores
可怜的恐龙)。我们怎样才能搜索允许拼写错误呢?我们可以将搜索中看到的常见拼写错误与正确的拼写相关联,然后搜索原始术语+更正的术语,但这维护起来很耗时。
有什么方法可以编程吗?
编辑
看来 SOUNDEX 可以提供帮助,但是任何人都可以给我一个使用 soundex 的示例,其中输入搜索词:
Dinosores wrocks
返回记录而不是执行:
WHERE articleBody LIKE '%Dinosaurs%' OR articleBody LIKE '%Wrocks%'
这将返回squaoosh?
Given your data stored somewhere in a database:
Hello my name is Tom I like dinosaurs to talk about SQL.
SQL is amazing. I really like SQL.
We want to implement a site search, allowing visitors to enter terms and return relating records. A user might search for:
Dinosaurs
And the SQL:
WHERE articleBody LIKE '%Dinosaurs%'
Copes fine with returning the correct set of records.
How would we cope however, if a user mispells dinosaurs? IE:
Dinosores
(Poor sore dino). How can we search allowing for error in spelling? We can associate common misspellings we see in search with the correct spelling, and then search on the original terms + corrected term, but this is time consuming to maintain.
Any way programatically?
Edit
Appears SOUNDEX could help, but can anyone give me an example using soundex where entering the search term:
Dinosores wrocks
returns records instead of doing:
WHERE articleBody LIKE '%Dinosaurs%' OR articleBody LIKE '%Wrocks%'
which would return squadoosh?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
如果您使用的是 SQL Server,请查看 SOUNDEX。
对于您的示例:
返回相同的值 (D526) 。
您还可以使用 DIFFERENCE 函数(在与 soundex 相同的链接上)来比较相似程度(4 表示最相似,0 表示最相似)。
编辑:
在寻找多文本选项后,似乎这并不那么容易。我建议您参考 @Neil Knight 提供的 Fuzzt Logic 答案的链接(对我来说+1!)。
这篇 stackoverflow 文章还详细介绍了 TSQL 中模糊逻辑实现的可能来源。有一次,受访者还概述了全文索引作为您可能想要研究的潜力。
If you're using SQL Server, have a look at SOUNDEX.
For your example:
Returns identical values (D526) .
You can also use DIFFERENCE function (on same link as soundex) that will compare levels of similarity (4 being the most similar, 0 being the least).
Edit:
After hunting around a bit for a multi-text option, it seems that this isn't all that easy. I would refer you to the link on the Fuzzt Logic answer provided by @Neil Knight (+1 to that, for me!).
This stackoverflow article also details possible sources for implentations for Fuzzy Logic in TSQL. Once respondant also outlined Full text Indexing as a potential that you might want to investigate.
也许您的 RDBMS 有
SOUNDEX
函数?你没有提到这里涉及哪一个。SOUNDEX
Perhaps your RDBMS has a
SOUNDEX
function? You didn't mention which one was involved here.SOUNDEX
只是为了抛出一个替代方案。如果可以选择 SSIS,那么您可以使用模糊查找。
SSIS 模糊查找
Just to throw an alternative out there. If SSIS is an option, then you can use Fuzzy Lookup.
SSIS Fuzzy Lookup
我不确定引入单独的“搜索引擎”是否可行,但如果您查看 Google Search Appliance 或 Autonomy 等产品,您会发现这些产品可以索引 SQL 数据库并提供更多搜索选项 - 例如,还可以处理拼写错误如同义词、搜索结果权重、替代搜索建议等。
此外,SQL Server 的全文搜索功能可以配置为使用同义词库,这可能会有所帮助:
http://msdn.microsoft.com/en-us/library/ms142491。 :
这是某人设置同义词库来处理常见拼写错误的另一个问题
SQL Server 中的 FORMSOF 同义词库
I'm not sure if introducing a separate "search engine" is possible, but if you look at products like the Google search appliance or Autonomy, these products can index a SQL database and provide more searching options - for example, handling misspellings as well as synonyms, search results weighting, alternative search recommendations, etc.
Also, SQL Server's full-text search feature can be configured to use a thesaurus, which might help:
http://msdn.microsoft.com/en-us/library/ms142491.aspx
Here is another SO question from someone setting up a thesaurus to handle common misspellings:
FORMSOF Thesaurus in SQL Server
简而言之,大多数 SQL 引擎都没有内置任何东西可以对“胖手指”进行基于字典的纠正。 SoundEx 确实可以作为一种工具来查找听起来相似的单词,从而纠正语音拼写错误,但如果用户输入“Dinosars”时漏掉了最后的 U,或者确实“笨手笨脚”地输入了“Dinosayrs”,SoundEx 就会不返回精确匹配。
听起来您想要 Google 搜索级别的内容“您的意思是__吗?”特征。我可以告诉你,这并不像看上去那么简单。在 10,000 英尺的高度,搜索引擎会查看每个关键字,看看它是否在已知“好”搜索词的“字典”中。如果不是,它会使用类似于拼写检查建议的算法来查找最接近匹配的字典单词(需要最少的字母替换、添加、删除和换位才能将给定单词转换为字典单词)。这将需要一些繁重的过程代码,无论是在数据库中的存储过程或 CLR Db 函数中,还是在业务逻辑层中。
Short answer, there is nothing built in to most SQL engines that can do dictionary-based correction of "fat fingers". SoundEx does work as a tool to find words that would sound alike and thus correct for phonetic misspellings, but if the user typed in "Dinosars" missing the final U, or truly "fat-fingered" it and entered "Dinosayrs", SoundEx would not return an exact match.
Sounds like you want something on the level of Google Search's "Did you mean __?" feature. I can tell you that is not as simple as it looks. At a 10,000-foot level, the search engine would look at each of those keywords and see if it's in a "dictionary" of known "good" search terms. If it isn't, it uses an algorithm much like a spell-checker suggestion to find the dictionary word that is the closest match (requires the fewest letter substitutions, additions, deletions and transpositions to turn the given word into the dictionary word). This will require some heavy procedural code, either in a stored proc or CLR Db function in your database, or in your business logic layer.
您还可以尝试使用
SubString()
来消除前 3 个左右的字符。下面是如何实现这一目标的示例You can also try the
SubString()
, to eliminate the first 3 or so characters . Below is an example of how that can be achieved