如何让 postgresql 文本搜索在排名中使用搜索词顺序

发布于 2024-10-19 04:06:36 字数 1078 浏览 1 评论 0原文

以下 PostgreSQL 文本搜索,

select 
    ID, DISPLAY_NAME, 
    ts_rank_cd(to_tsvector('english', display_name), query) as RANK
from 
    my_table, 
    to_tsquery('english', 'John:*&Bernard:*') as query
where 
    to_tsvector('english', display_name) @@ query
    order by RANK DESC

生成

ID    DISPLAY_NAME         RANK   
=====================================
82683 "BERNARD JOHN SMBZh" 0.05
63815 "BERNARD JOHN []zkP" 0.05
68204 "BERNARD JOHN uPmYB" 0.05
29666 "John Bernard iECx"  0.05
44256 "John Bernard DpIff" 0.05
52601 "BERNARD JOHN ivRTX" 0.05
80250 "BERNARD JOHN b'nVp" 0.0430677

但我真正希望的是“John Bernard*”记录具有更高的排名,因为“文档”中的术语与查询的出现顺序相同。这可能吗?

例如这样的结果:

ID    DISPLAY_NAME         RANK   
=====================================
29666 "John Bernard iECx"  0.10
44256 "John Bernard DpIff" 0.10
82683 "BERNARD JOHN SMBZh" 0.05
63815 "BERNARD JOHN []zkP" 0.05
68204 "BERNARD JOHN uPmYB" 0.05
52601 "BERNARD JOHN ivRTX" 0.05
80250 "BERNARD JOHN b'nVp" 0.0430677

干杯 克雷格

The following PostgreSQL text search

select 
    ID, DISPLAY_NAME, 
    ts_rank_cd(to_tsvector('english', display_name), query) as RANK
from 
    my_table, 
    to_tsquery('english', 'John:*&Bernard:*') as query
where 
    to_tsvector('english', display_name) @@ query
    order by RANK DESC

produces

ID    DISPLAY_NAME         RANK   
=====================================
82683 "BERNARD JOHN SMBZh" 0.05
63815 "BERNARD JOHN []zkP" 0.05
68204 "BERNARD JOHN uPmYB" 0.05
29666 "John Bernard iECx"  0.05
44256 "John Bernard DpIff" 0.05
52601 "BERNARD JOHN ivRTX" 0.05
80250 "BERNARD JOHN b'nVp" 0.0430677

but what I really would like is for the "John Bernard*" records to have a higher rank because the terms in the "document" appear in the same order as the query. Is this possible?

e.g. A result like this:

ID    DISPLAY_NAME         RANK   
=====================================
29666 "John Bernard iECx"  0.10
44256 "John Bernard DpIff" 0.10
82683 "BERNARD JOHN SMBZh" 0.05
63815 "BERNARD JOHN []zkP" 0.05
68204 "BERNARD JOHN uPmYB" 0.05
52601 "BERNARD JOHN ivRTX" 0.05
80250 "BERNARD JOHN b'nVp" 0.0430677

Cheers
Craig

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

时光是把杀猪刀 2024-10-26 04:06:36

我认为您将不得不考虑与 tsearch 一起涉及另一种排名机制的解决方案,因为它 不处理短语

怎么样:

create table my_table(id serial primary key, display_name text);
insert into my_table(display_name) values ('John Bernard iECx'), 
                                          ('John Bernard DpIff'), 
                                          ('BERNARD JOHN SMBZh'), 
                                          ('BERNARD JOHN b''nVp');
select
    ID, DISPLAY_NAME,
    ts_rank_cd(to_tsvector('english', display_name), query)
      *case when display_name~*'.*john bernard.*' then 2 else 1 end as RANK
from
    my_table,
    to_tsquery('english', 'John:*&Bernard:*') as query
where
    to_tsvector('english', display_name) @@ query
    order by RANK DESC;

生产:

 id |    display_name    |       rank
----+--------------------+-------------------
  1 | John Bernard iECx  | 0.200000002980232
  2 | John Bernard DpIff | 0.200000002980232
  3 | BERNARD JOHN SMBZh | 0.100000001490116
  4 | BERNARD JOHN b'nVp | 0.100000001490116

I think you will have to consider a solution involving another ranking mechanism alongside tsearch as it does not handle phrases.

How about something like:

create table my_table(id serial primary key, display_name text);
insert into my_table(display_name) values ('John Bernard iECx'), 
                                          ('John Bernard DpIff'), 
                                          ('BERNARD JOHN SMBZh'), 
                                          ('BERNARD JOHN b''nVp');
select
    ID, DISPLAY_NAME,
    ts_rank_cd(to_tsvector('english', display_name), query)
      *case when display_name~*'.*john bernard.*' then 2 else 1 end as RANK
from
    my_table,
    to_tsquery('english', 'John:*&Bernard:*') as query
where
    to_tsvector('english', display_name) @@ query
    order by RANK DESC;

producing:

 id |    display_name    |       rank
----+--------------------+-------------------
  1 | John Bernard iECx  | 0.200000002980232
  2 | John Bernard DpIff | 0.200000002980232
  3 | BERNARD JOHN SMBZh | 0.100000001490116
  4 | BERNARD JOHN b'nVp | 0.100000001490116
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文