MS SQL - WHERE 子字符串匹配是语音的?
我想制作一个基于“听起来像”匹配进行搜索的搜索功能。
例如,假设我有一个看起来像这样的公司列表(假设我们也生活在 Bizzaro 世界中):
- Acme
- Already allusion cite LTD
- 所有准备好的错觉网站 INC
- Apart assent
- Assentsight
(或者与名称类似的东西... George
或 Jeorge
?“Yah-way”或“ye-hova”?)
当有人搜索“听起来像”soundex("site") 的内容时== S230
,他们也应该看到“Sight”的结果。
大多数使用过 soudnex 的人都知道,普通的子字符串匹配显然不会这样做。
我试图在脑海中弄清楚如何创建一个可以基于此进行匹配的 WHERE 子句,因此我不想运行典型的 WHERE company LIKE input
,而是运行 soundex。显然,如果我对整个公司名称运行 soundex,我将无法进行子字符串搜索(例如,搜索“ALL”的用户永远不会匹配“All ready”的 soundex)。对每个单词进行 Soundex 拆分可能也不值得,因此我不确定运行 soundex 的所有组合是否是一个好主意……或者即使这在具有超过 1000 条记录的数据库中在计算上是可行的。
基本上,我想要的互动是当(在办公室或其他地方)汤姆对莎莉说“这个名字类似于Rebekkah Schwartzkopff”,并且可以通过语音搜索模糊匹配。
显然,由于 soudnex,我们会遇到非英文名称公司的问题,但我会在这一问题上做出妥协。
我想在不向数据库或存储过程添加任何内容的情况下执行此操作。
I'd like to make a search feature that searches based on "sounds like" match.
For instance, lets say I have a company list that looks like this (lets say we live in Bizzaro world too):
- Acme
- Already allusion cite LTD
- All ready illusion site INC
- Apart assent
- Assent sight
(Or something simmilar with names... George
or Jeorge
? "Yah-way", or "ye-hova" ?)
When someone searches for something that "sounds like" the soundex("site") == S230
, they should see results for "Sight" also.
As most people who've used soudnex before already know, normal substring matches obviously don't do this.
I'm trying to work out in my head how to make a WHERE clause that can match based on this, so instead of a typical WHERE company LIKE input
, I'd like to run a soundex. Obviously if I run soundex on the whole company name, I won't be able to do substring searching (for example, a user searching "ALL" will never match a soundex of "All ready"). Soundex split on each word might not be worthwhile either, so I'm not sure running all combinations of a soundex is a good idea... or even if that's going to be computationally feasible in a database with more than 1000 records.
Basically the interaction I want to have is when (in an office or something) Tom says to Sally "That name was something like Rebekkah Schwartzkopff" and it can be searched phonetically for a fuzzy match.
Obviously we're going to run into issues with non-English named companies because of soudnex, but I'm will to compromise on this one.
I'd like to do this without adding anything to the database, or a stored procedure.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
如果 SOUNDEX 对于您正在做的事情来说是一个好的开始,您可以使用 差异。
例如:
请注意,DIFFERENCE 函数使用 0-4 的值返回两个字符串的 SOUNDEX 值之间的差异; 4 表示字符串非常接近相同,0 表示它们完全不同(对我来说有点向后缩放,但我认为它有效)。
If SOUNDEX is a good beginning for what you are doing, you can use DIFFERENCE.
eg:
Note that the DIFFERENCE function returns the difference between the SOUNDEX values of two strings using a value of 0-4; 4 meaning the strings are pretty close to the same and 0 meaning they are completely different (kind of a backwards scale to me, but I suppose it works).
非常有趣的问题。我做了一些探索,发现了这个:
http://www.codeproject.com/KB /database/dmetaphone4.aspx
我自己还没有测试过,但似乎值得一试。
它需要您向数据库添加一些内容,但我不知道如何使用内置 SQL Server 功能来实现您想要的功能...
Very interesting question. I did a little poking around and found this:
http://www.codeproject.com/KB/database/dmetaphone4.aspx
I haven't tested it myself but it seems like it would be worth checking out.
It would require you to add something to the database, but I don't see how you can implement the functionality you want with built in SQL Server functionality...