使用 Levenshtein 编辑距离的 Google 风格搜索建议

发布于 2024-10-09 02:36:08 字数 1578 浏览 0 评论 0原文

好的,大家正在使用 jQuery-UI AutoComplete 以及来自 sql-sever 2008 db 的结果来处理搜索建议。使用 AdventureWorks DB Products 表进行测试。我想在这个例子中跨两个字段进行搜索。产品编号和名称。

我之前问了两个与此相关的问题...此处此处

到目前为止我已经想出了这个...

CREATE procedure [dbo].[procProductAutoComplete]
(
    @searchString nvarchar(100)
)
as
begin

    declare @param nvarchar(100);
    set @param = LOWER(@searchString);

WITH Results(result)
AS
(
    select TOP 10 Name as 'result'
    from Production.Product 
    where LOWER(Name) like '%' + @param + '%' or (0 <= dbo.lvn(@param, LOWER   (Name), 6))
    union
    select TOP 10 ProductNumber as 'result'
    from Production.Product
    where LOWER(ProductNumber) like '%' + @param + '%' or (0 <= dbo.lvn(@param,  LOWER(ProductNumber), 6))
)

SELECT TOP 20 * from Results

end;

我现在的问题是结果的排序...我得到了正确的结果,但它们只是按名称或产品编号排序,与输入字符串无关。 ..

例如,我可以搜索以“BZ-”开头的产品编号,返回的顶部结果是以“A”开头的 ProductNums,尽管我确实在列表中的其他位置获得了更相关的结果。.

根据以下方式对结果进行排序的任何想法与搜索字符串的相关性?

编辑:

关于levenschtein距离的tql实现,可以在此处找到(链接到上一个问题)...

我想知道确定发送到函数的 MAX 值的最佳方法是什么(在上面的示例中为 6)

最好根据“似乎”效果良好的值选择任意值对于我给定的数据集?或者最好根据输入字符串的长度动态调整它...

我最初的想法是值应该与 searchString 的长度成反比...所以随着搜索字符串的增长并变得越来越多具体..容忍度降低...想法?

Ok guys working on search suggestions using jQuery-UI AutoComplete with results from sql-sever 2008 db. Using AdventureWorks DB Products table for testing. I want to search across 2 fields in this example. ProductNumber and Name.

I asked 2 questions earlier relating to this...here and here

and ive come up with this so far...

CREATE procedure [dbo].[procProductAutoComplete]
(
    @searchString nvarchar(100)
)
as
begin

    declare @param nvarchar(100);
    set @param = LOWER(@searchString);

WITH Results(result)
AS
(
    select TOP 10 Name as 'result'
    from Production.Product 
    where LOWER(Name) like '%' + @param + '%' or (0 <= dbo.lvn(@param, LOWER   (Name), 6))
    union
    select TOP 10 ProductNumber as 'result'
    from Production.Product
    where LOWER(ProductNumber) like '%' + @param + '%' or (0 <= dbo.lvn(@param,  LOWER(ProductNumber), 6))
)

SELECT TOP 20 * from Results

end;

My problem now is ordering of the results...I am getting the correct results but they are just ordered by the Name or product number and are not relevant to the input string...

for example I can search for product Number starting with "BZ-" and the top returned results are ProductNums starting with "A" although I do get more relevant results elsewhere in the list..

any ideas for sorting the results in terms of relevance to the search string??

EDIT:

in regards to the tql implementation of the levenschtein distance found here(linked to in previous question)...

I am wondering what would be the best way to determine the MAX value to send to the function (6 in my example above)

Would it be best to choose an arbitrary value based on what "seems" to work well for my given data set? or would it be best to adjust it dynamically based on the length of the input string...

My initial thoughs were that the value to should be inverely proportional to the length of the searchString...so as the search string grows and becomes more specific..the tolerance decreases...thoughts??

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

一袭白衣梦中忆 2024-10-16 02:36:08

使用 SQL Server 时,全文搜索功能似乎是最佳选择

The Full Text Search feature seems to be the way go when using SQL Server

抠脚大汉 2024-10-16 02:36:08

相关性是 dbo.lvn() 的结果。它返回将一个字符串转换为另一字符串所需的操作量。所以答案很简单:

ORDER BY dbo.lvn(@param, LOWER (Name), 6)

但这不能与 LIKE 结合使用,就像这样不返回任何相关值。但使用 LIKE 根本不是一个好主意。如果有人给“tooth”小费来购买“牙膏”,他会得到“蓝牙”作为建议。

为了使 devlim 更快,请阅读此处:
https://stackoverflow.com/a/14261807/318765

The relevance is the result of dbo.lvn(). It returns the amount of operations need to transform one string into the other. So the answer is simple:

ORDER BY dbo.lvn(@param, LOWER (Name), 6)

But this won't work in combination to the LIKE as this does not return any relevance value. But the usage of LIKE is not a good idea at all. If someone is tiping "tooth" to buy "toothpaste" he would get "bluetooth" as proposal.

To make devlim faster read here:
https://stackoverflow.com/a/14261807/318765

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文