sql 数据库可以返回 sql 结果集以及结果的分数吗?

发布于 2024-10-15 11:52:38 字数 330 浏览 0 评论 0原文

只是好奇,如果我想将字符串发送到数据库(可能是 MS SQL Server),任何人都可以提供有关从数据库返回结果的最佳方法的任何见解,其中结果集可能会在其上进行排序和“评分”与传入的字符串的接近程度?

那么,如果我发送一个查询:,

SELECT name FROM table where name LIKE 'Jon'

然后得到 1000 个结果,如下所示:

100 Jon
98  John
80  Jonathan
32  Nathan

视图、索引、存储过程、编码解决方案?有什么建议?

Just curious, if I wanted to send strings to a database (perhaps for MS SQL Server) can anyone provide any insight on what the best way would be to return results from a database where the result set might be sorted and "scored" on its closeness to the string passed in?

So, if I sent a query for :

SELECT name FROM table where name LIKE 'Jon'

and then get a result of 1000 results that looks like:

100 Jon
98  John
80  Jonathan
32  Nathan

Views, indexes, stored procedures, coded solution? What is the recommendation?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

萤火眠眠 2024-10-22 11:52:38

你可以,但你需要使用另一个函数来完成它。 Levenshtein 比率或 Jaro 距离 将是最常见的解决方案。我不确定 SQL Server 为此内置了什么(如果有的话)。如果不出意外,我认为您可以使用 SimMetrics 库,如此处。无论如何,它看起来就像这样。

select top 1000
jaro('John', name) as score, name
from table
where name like '%John%'
order by 1 desc

You could, but you'd need to use another function to do it. Levenshtein ratio or Jaro distance would be the most common solutions. I'm not sure what, if anything, SQL Server includes builtin for this. If nothing else I think you can use the SimMetrics library as described here. Regardless, it would look something like this.

select top 1000
jaro('John', name) as score, name
from table
where name like '%John%'
order by 1 desc
爱的故事 2024-10-22 11:52:38

编辑

由于评论中的一些持续的刺激,我在这里展示了 SQL 中 Levenshtein 距离计算的实现。这里使用 SQL Server 2005+ 的 TSQL,但该技术也可以转换到其他 DBMS。 最高得分为 100

;with tbl as (
    select 'Jon' AS Name union all
    select 'Jonathan' union all
    select 'Jonny' union all
    select 'John' union all
    select 'Bone' union all
    select 'BJon' union all
    select 'Nathan' union all
    select 'Jonne')
SELECT *, SCORE_Levenshtein + SCORE_SOUNDEX TotalScore
FROM
(
SELECT name,
    CAST(50 /
    (
        select 1.0 + MAX(LDist)
        FROM
        (
            select startAt.number,
                LEN(longer) -
                sum(case when SUBSTRING(longer, startAt.number+offset.number, 1)
                            = SUBSTRING(shorter, 1+offset.number, 1) then 1 else 0 end ) LDist
            FROM
            (select case when LEN(Name) < LEN(LookFor) then Name else LookFor end shorter) shorter
            cross join
            (select case when LEN(Name) >= LEN(LookFor) then Name else LookFor end longer) longer
            inner join master..spt_values startAt
                on startAt.type='P' and startAt.number between 1 and len(longer) - LEN(shorter) + 1
            inner join master..spt_values offset
                on offset.type='P' and offset.number between 0 and LEN(shorter)-1
            group by startAt.number, longer, shorter
        ) X
    ) AS NUMERIC(16,4)) SCORE_Levenshtein
    ,
    CAST(50 / (5-  -- inversely proportional to soundex difference
    (
    SELECT 0.0 +
    case when Substring(A,1,1)=Substring(B,1,1) then 1 else 0 end
    +
    case when Substring(A,2,1)=Substring(B,2,1) then 1 else 0 end
    +
    case when Substring(A,3,1)=Substring(B,3,1) then 1 else 0 end
    +
    case when Substring(A,4,1)=Substring(B,4,1) then 1 else 0 end
    FROM (select soundex(name) as A, SOUNDEX(LookFor) as B) X
    )) AS NUMERIC(16,4)) AS SCORE_SOUNDEX
FROM tbl
CROSS JOIN (SELECT 'Jon' as LookFor) LookFor
) Scored
Order by SCORE_Levenshtein + SCORE_SOUNDEX DESC

注意 - 使用此行 CROSS JOIN (SELECT 'Jon' as LookFor) LookFor 是为了使输入 'Jon' 不需要在查询中重复多次。还可以定义一个变量,并在查询中使用 LookFor 的地方使用它。

输出

值得注意的是,与 SOUNDEX 一起使用时,Jonny 的得分高于 Bone,而单独使用 Levenshtein 则不会发生这种情况。

name      SCORE_Levenshtein  SCORE_SOUNDEX   TotalScore
Jon       50.0000            50.0000         100.0000
John      12.5000            50.0000          62.5000
Jonny      8.3333            50.0000          58.3333
Jonne      8.3333            50.0000          58.3333
Bone      10.0000            25.0000          35.0000
BJon      10.0000            12.5000          22.5000
Jonathan   5.5556            16.6667          22.2223
Nathan     7.1429            12.5000          19.6429


原始答案如下,基于 LIKE '%x%' 对输入进行预过滤,将 Levenshtein 折叠为简单的 Len(column) - Len(Like-expression) 计算

看一下这个示例 - 它测试长度和SOUNDEX 差异,因缺乏更好的措施。

最高得分为 100。

;with tbl as (
    select 'Jon' AS Name union all
    select 'Jonathan' union all
    select 'Jonny' union all
    select 'John' union all  -- doesn't match LIKE
    select 'BJon' union all
    select 'Jonne')
SELECT name,
     50 / (Len(Name) - LEN('Jon') + 1.0)  -- inversely proportional to length difference
     +
     50 / (5-  -- inversely proportional to soundex difference
    (
    SELECT 0.0 +
    case when Substring(A,1,1)=Substring(B,1,1) then 1 else 0 end
    +
    case when Substring(A,2,1)=Substring(B,2,1) then 1 else 0 end
    +
    case when Substring(A,3,1)=Substring(B,3,1) then 1 else 0 end
    +
    case when Substring(A,4,1)=Substring(B,4,1) then 1 else 0 end
    FROM (select soundex(name) as A, SOUNDEX('Jon') as B) X
    )) AS SCORE
FROM tbl
where name LIKE '%Jon%'
Order by SCORE DESC

输出

name     SCORE
Jon      100.00000000000000000
Jonny    66.66666666666660000
Jonne    66.66666666666660000
BJon     37.50000000000000000
Jonathan 24.99999999999996666

EDIT

Due to some persistent prodding from the comments, I present here an implementation of the Levenshtein distance calculation in SQL. TSQL for SQL Server 2005+ is used here, but the technique can be converted to other DBMS as well. Maximum score is 100.

;with tbl as (
    select 'Jon' AS Name union all
    select 'Jonathan' union all
    select 'Jonny' union all
    select 'John' union all
    select 'Bone' union all
    select 'BJon' union all
    select 'Nathan' union all
    select 'Jonne')
SELECT *, SCORE_Levenshtein + SCORE_SOUNDEX TotalScore
FROM
(
SELECT name,
    CAST(50 /
    (
        select 1.0 + MAX(LDist)
        FROM
        (
            select startAt.number,
                LEN(longer) -
                sum(case when SUBSTRING(longer, startAt.number+offset.number, 1)
                            = SUBSTRING(shorter, 1+offset.number, 1) then 1 else 0 end ) LDist
            FROM
            (select case when LEN(Name) < LEN(LookFor) then Name else LookFor end shorter) shorter
            cross join
            (select case when LEN(Name) >= LEN(LookFor) then Name else LookFor end longer) longer
            inner join master..spt_values startAt
                on startAt.type='P' and startAt.number between 1 and len(longer) - LEN(shorter) + 1
            inner join master..spt_values offset
                on offset.type='P' and offset.number between 0 and LEN(shorter)-1
            group by startAt.number, longer, shorter
        ) X
    ) AS NUMERIC(16,4)) SCORE_Levenshtein
    ,
    CAST(50 / (5-  -- inversely proportional to soundex difference
    (
    SELECT 0.0 +
    case when Substring(A,1,1)=Substring(B,1,1) then 1 else 0 end
    +
    case when Substring(A,2,1)=Substring(B,2,1) then 1 else 0 end
    +
    case when Substring(A,3,1)=Substring(B,3,1) then 1 else 0 end
    +
    case when Substring(A,4,1)=Substring(B,4,1) then 1 else 0 end
    FROM (select soundex(name) as A, SOUNDEX(LookFor) as B) X
    )) AS NUMERIC(16,4)) AS SCORE_SOUNDEX
FROM tbl
CROSS JOIN (SELECT 'Jon' as LookFor) LookFor
) Scored
Order by SCORE_Levenshtein + SCORE_SOUNDEX DESC

Note - This line CROSS JOIN (SELECT 'Jon' as LookFor) LookFor is used so that the input 'Jon' does not need to be repeated many times in the query. One could also define a variable instead and use it where LookFor is used in the query.

Output

It is worth noting that together with SOUNDEX, Jonny gets to score higher than Bone which won't happen with Levenshtein alone.

name      SCORE_Levenshtein  SCORE_SOUNDEX   TotalScore
Jon       50.0000            50.0000         100.0000
John      12.5000            50.0000          62.5000
Jonny      8.3333            50.0000          58.3333
Jonne      8.3333            50.0000          58.3333
Bone      10.0000            25.0000          35.0000
BJon      10.0000            12.5000          22.5000
Jonathan   5.5556            16.6667          22.2223
Nathan     7.1429            12.5000          19.6429

Original answer follows, based on pre-filtering the input based on LIKE '%x%' which collapses the Levenshtein to a simple Len(column) - Len(Like-expression) calculation

Have a look at this example - it tests the length and SOUNDEX differences, for lack of better measures.

The maximum score is 100.

;with tbl as (
    select 'Jon' AS Name union all
    select 'Jonathan' union all
    select 'Jonny' union all
    select 'John' union all  -- doesn't match LIKE
    select 'BJon' union all
    select 'Jonne')
SELECT name,
     50 / (Len(Name) - LEN('Jon') + 1.0)  -- inversely proportional to length difference
     +
     50 / (5-  -- inversely proportional to soundex difference
    (
    SELECT 0.0 +
    case when Substring(A,1,1)=Substring(B,1,1) then 1 else 0 end
    +
    case when Substring(A,2,1)=Substring(B,2,1) then 1 else 0 end
    +
    case when Substring(A,3,1)=Substring(B,3,1) then 1 else 0 end
    +
    case when Substring(A,4,1)=Substring(B,4,1) then 1 else 0 end
    FROM (select soundex(name) as A, SOUNDEX('Jon') as B) X
    )) AS SCORE
FROM tbl
where name LIKE '%Jon%'
Order by SCORE DESC

Output

name     SCORE
Jon      100.00000000000000000
Jonny    66.66666666666660000
Jonne    66.66666666666660000
BJon     37.50000000000000000
Jonathan 24.99999999999996666
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文