MySQL:使用 MySQL 相关性搜索的特殊搜索算法

发布于 2024-12-11 17:06:06 字数 2489 浏览 0 评论 0原文

我正在尝试在 MySQL 中进行搜索,其中用户只有一个字段。该表如下所示:

ID          BIGINT
TITLE       TEXT
DESCRIPTION TEXT
FILENAME    TEXT
TAGS        TEXT
ACTIVE      TINYINT

现在,如果用户仅输入 blah blubber,搜索必须检查每个单词是否出现在 TITLEDESCRIOTION 字段中、FILENAMETAGS。结果本身应按相关性排序,即字符串在记录中出现的频率。我得到了这个示例数据:

ID   | TITLE   | DESCRIPTION  | FILENAME | TAGS | ACTIVE
1    | blah    | blah         | bdsai    | bdha | 1
2    | blubber | blah         | blah     | adsb | 1
3    | blah    | dsabsadsab   | dnsa     | dsa  | 1

在这个例子中,ID 2 必须位于顶部(2x blah,1x blubber),然后是 1(2x blah),然后是 3(1x blah)。这一过程应该是动态的,因此用户还可以输入更多单词,并且相关性与一个或多个单词的工作方式相同。

这只能在MySQL中实现吗,还是我必须使用一些PHP?这究竟是如何运作的?

非常感谢您的帮助!问候,Florian

编辑:这是我尝试 Tom Mac 的答案后的结果:

我有四个记录,如下所示:

ID  | TITLE | DESCRIPTION | FILENAME | TAGS                          | ACTIVE
1   | s     | s           | s        | s                             | 1
2   | 0     | fdsadf      | sdfs     | a,b,c,d,e,f,s,a,a,s,s,as,sada | 1
3   | 0     | s           | s        | s                             | 1
4   | a     | a           | a        | a                             | 1

现在,如果我搜索字符串 s,我应该只得到顶部三个记录,按 s 的相关性排序。这意味着,记录应该是这样的排序者:

ID | TITLE | DESCRIPTION | FILENAME | TAGS                          | ACTIVE
2  | 0     | fdsadf      | sdfs     | a,b,c,d,e,f,s,a,a,s,s,as,sada | 1        <== 8x s
1  | s     | s           | s        | s                             | 1        <== 4x s
3  | 0     | s           | s        | s                             | 1        <== 3x s

现在,我尝试了这样的查询(表的名称是 PAGES):

select t . *
  from (

        select 
              match(title) against('*s*' in boolean mode) 
            + match(description) against('*s*' in boolean mode) 
            + match(filename) against('*s*' in boolean mode) 
            + match(tags) against('*s*' in boolean mode) 
            as matchrank,
                bb . *
          from pages bb) t
 where t.matchrank > 0
 order by t.matchrank desc

该查询返回以下内容:

matchRank | ID  | TITLE | DESCRIPTION | FILENAME | TAGS                          | ACTIVE
2         | 2   | 0     | fdsadf      | sdfs     | a,b,c,d,e,f,s,a,a,s,s,as,sada | 1

这是因为通配符吗?我认为,字符串 *s* 也应该找到一个只有 s 的值......

I'm trying to do a search in MySQL where the user just has one field. The table looks like this:

ID          BIGINT
TITLE       TEXT
DESCRIPTION TEXT
FILENAME    TEXT
TAGS        TEXT
ACTIVE      TINYINT

Now if the user inputs just blah blubber, the search must check wether every word appears in the fields TITLE, DESCRIOTION, FILENAME or TAGS. The result itself should be ordered by relevance, so how often does a string appear in the record. I got this example data:

ID   | TITLE   | DESCRIPTION  | FILENAME | TAGS | ACTIVE
1    | blah    | blah         | bdsai    | bdha | 1
2    | blubber | blah         | blah     | adsb | 1
3    | blah    | dsabsadsab   | dnsa     | dsa  | 1

In this example, ID 2 must be at the top (2x blah, 1x blubber), then 1 (2x blah ) and then 3 (1x blah). This process should be dynamical so the user can also input more words and the relevance works same as with one or several words.

Is this possible to realize only in MySQL, or do I have to use some PHP? How would this work exactly?

Thank you very much for your help! Regards, Florian

EDIT: Here is the result after I tried the answer of Tom Mac:

I have four records which look like this:

ID  | TITLE | DESCRIPTION | FILENAME | TAGS                          | ACTIVE
1   | s     | s           | s        | s                             | 1
2   | 0     | fdsadf      | sdfs     | a,b,c,d,e,f,s,a,a,s,s,as,sada | 1
3   | 0     | s           | s        | s                             | 1
4   | a     | a           | a        | a                             | 1

Now, if I search for the string s, I should only get the top three records, ordered by a relevance of s. This means, the records should be orderer like this:

ID | TITLE | DESCRIPTION | FILENAME | TAGS                          | ACTIVE
2  | 0     | fdsadf      | sdfs     | a,b,c,d,e,f,s,a,a,s,s,as,sada | 1        <== 8x s
1  | s     | s           | s        | s                             | 1        <== 4x s
3  | 0     | s           | s        | s                             | 1        <== 3x s

Now, I tried my query like this (the table's name is PAGES):

select t . *
  from (

        select 
              match(title) against('*s*' in boolean mode) 
            + match(description) against('*s*' in boolean mode) 
            + match(filename) against('*s*' in boolean mode) 
            + match(tags) against('*s*' in boolean mode) 
            as matchrank,
                bb . *
          from pages bb) t
 where t.matchrank > 0
 order by t.matchrank desc

This query returns this:

matchRank | ID  | TITLE | DESCRIPTION | FILENAME | TAGS                          | ACTIVE
2         | 2   | 0     | fdsadf      | sdfs     | a,b,c,d,e,f,s,a,a,s,s,as,sada | 1

Is this because of the wildcards? I think, the string *s* should also find a value which is only s ...

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

眼趣 2024-12-18 17:06:06

这可能会帮助你。它确实有点假设您的 MySQL 表使用 MyISAM 引擎:

create table blubberBlah (id int unsigned not null primary key auto_increment,
title varchar(50) not null,
description varchar(50) not null,
filename varchar(50) not null,
tags varchar(50)not null,
active tinyint not null
) engine=MyISAM;

insert into blubberBlah (title,description,filename,tags,active) 
values ('blah','blah','bdsai','bdha',1);
insert into blubberBlah (title,description,filename,tags,active) 
values ('blubber','blah','blah','adsb',1);
insert into blubberBlah (title,description,filename,tags,active) 
values ('blah','dsabsadsab','dnsa','dsa',1);

select t.*
from
(
 select MATCH (title) AGAINST ('blubber blah' IN BOOLEAN MODE)
       +MATCH (description) AGAINST ('blubber blah' IN BOOLEAN MODE)
       +MATCH (fileName) AGAINST ('blubber blah' IN BOOLEAN MODE)
       +MATCH (tags) AGAINST ('blubber blah' IN BOOLEAN MODE) as matchRank,
       bb.*
from blubberBlah bb
) t
order by t.matchRank desc;

编辑

该解决方案所做的另一个假设是您搜索的字符串长度 >= 4 个字符。如果“搜索”字符串(即“blubber”或“blah”)的长度可能为 1、2 或 3 个字符,那么您可以随时前往 my.cnf 文件并添加 ft_min_word_len=1< /code> 在 [mysqld] 配置选项下。然后重新启动 MySQL,就可以了。

最后一件事:如果您正在考虑使用这种方法,那么您应该向每一列添加全文索引。因此:

ALTER TABLE blubberBlah add fulltext index `blubberBlahFtIdx1`(`title`);
ALTER TABLE blubberBlah add fulltext index `blubberBlahFtIdx2`(`description`);
ALTER TABLE blubberBlah add fulltext index `blubberBlahFtIdx3`(`filename`);
ALTER TABLE blubberBlah add fulltext index `blubberBlahFtIdx4`(`tags`);

您可以在 MySQL 文档中找到有关 BOOLEAN FULLTEXT 搜索的更多详细信息

This might help you out. It does kinda assume that your MySQL table uses the MyISAM engine though:

create table blubberBlah (id int unsigned not null primary key auto_increment,
title varchar(50) not null,
description varchar(50) not null,
filename varchar(50) not null,
tags varchar(50)not null,
active tinyint not null
) engine=MyISAM;

insert into blubberBlah (title,description,filename,tags,active) 
values ('blah','blah','bdsai','bdha',1);
insert into blubberBlah (title,description,filename,tags,active) 
values ('blubber','blah','blah','adsb',1);
insert into blubberBlah (title,description,filename,tags,active) 
values ('blah','dsabsadsab','dnsa','dsa',1);

select t.*
from
(
 select MATCH (title) AGAINST ('blubber blah' IN BOOLEAN MODE)
       +MATCH (description) AGAINST ('blubber blah' IN BOOLEAN MODE)
       +MATCH (fileName) AGAINST ('blubber blah' IN BOOLEAN MODE)
       +MATCH (tags) AGAINST ('blubber blah' IN BOOLEAN MODE) as matchRank,
       bb.*
from blubberBlah bb
) t
order by t.matchRank desc;

EDIT

Another assumption that this solution makes is that the string that your searching for is >= 4 characters long. If there is a possibility that the 'search for' string i.e 'blubber' or 'blah' will be either 1, 2 or 3 characters long then you can always head to your my.cnf file and add ft_min_word_len=1 under the [mysqld] configuration options. Then restart MySQL and you should be good to go.

One final thing: if you are considering using this approach then you should add a FULLTEXT INDEX to each of the columns. Hence:

ALTER TABLE blubberBlah add fulltext index `blubberBlahFtIdx1`(`title`);
ALTER TABLE blubberBlah add fulltext index `blubberBlahFtIdx2`(`description`);
ALTER TABLE blubberBlah add fulltext index `blubberBlahFtIdx3`(`filename`);
ALTER TABLE blubberBlah add fulltext index `blubberBlahFtIdx4`(`tags`);

You can find more details on BOOLEAN FULLTEXT searching in the MySQL Docs.

四叶草在未来唯美盛开 2024-12-18 17:06:06

使用 Match() Against() 来确定分数,而不是“以布尔模式”搜索。将这些分数相加以获得相关性。

Rather than searching 'in boolean mode', use Match() Against() to determine a score. Add those scores up to get relevance.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文