MySQL:使用 MySQL 相关性搜索的特殊搜索算法
我正在尝试在 MySQL 中进行搜索,其中用户只有一个字段。该表如下所示:
ID BIGINT
TITLE TEXT
DESCRIPTION TEXT
FILENAME TEXT
TAGS TEXT
ACTIVE TINYINT
现在,如果用户仅输入 blah blubber
,搜索必须检查每个单词是否出现在 TITLE
、DESCRIOTION
字段中、FILENAME
或 TAGS
。结果本身应按相关性排序,即字符串在记录中出现的频率。我得到了这个示例数据:
ID | TITLE | DESCRIPTION | FILENAME | TAGS | ACTIVE
1 | blah | blah | bdsai | bdha | 1
2 | blubber | blah | blah | adsb | 1
3 | blah | dsabsadsab | dnsa | dsa | 1
在这个例子中,ID 2 必须位于顶部(2x blah,1x blubber),然后是 1(2x blah),然后是 3(1x blah)。这一过程应该是动态的,因此用户还可以输入更多单词,并且相关性与一个或多个单词的工作方式相同。
这只能在MySQL中实现吗,还是我必须使用一些PHP?这究竟是如何运作的?
非常感谢您的帮助!问候,Florian
编辑:这是我尝试 Tom Mac 的答案后的结果:
我有四个记录,如下所示:
ID | TITLE | DESCRIPTION | FILENAME | TAGS | ACTIVE
1 | s | s | s | s | 1
2 | 0 | fdsadf | sdfs | a,b,c,d,e,f,s,a,a,s,s,as,sada | 1
3 | 0 | s | s | s | 1
4 | a | a | a | a | 1
现在,如果我搜索字符串 s
,我应该只得到顶部三个记录,按 s 的相关性排序。这意味着,记录应该是这样的排序者:
ID | TITLE | DESCRIPTION | FILENAME | TAGS | ACTIVE
2 | 0 | fdsadf | sdfs | a,b,c,d,e,f,s,a,a,s,s,as,sada | 1 <== 8x s
1 | s | s | s | s | 1 <== 4x s
3 | 0 | s | s | s | 1 <== 3x s
现在,我尝试了这样的查询(表的名称是 PAGES
):
select t . *
from (
select
match(title) against('*s*' in boolean mode)
+ match(description) against('*s*' in boolean mode)
+ match(filename) against('*s*' in boolean mode)
+ match(tags) against('*s*' in boolean mode)
as matchrank,
bb . *
from pages bb) t
where t.matchrank > 0
order by t.matchrank desc
该查询返回以下内容:
matchRank | ID | TITLE | DESCRIPTION | FILENAME | TAGS | ACTIVE
2 | 2 | 0 | fdsadf | sdfs | a,b,c,d,e,f,s,a,a,s,s,as,sada | 1
这是因为通配符吗?我认为,字符串 *s*
也应该找到一个只有 s
的值......
I'm trying to do a search in MySQL where the user just has one field. The table looks like this:
ID BIGINT
TITLE TEXT
DESCRIPTION TEXT
FILENAME TEXT
TAGS TEXT
ACTIVE TINYINT
Now if the user inputs just blah blubber
, the search must check wether every word appears in the fields TITLE
, DESCRIOTION
, FILENAME
or TAGS
. The result itself should be ordered by relevance, so how often does a string appear in the record. I got this example data:
ID | TITLE | DESCRIPTION | FILENAME | TAGS | ACTIVE
1 | blah | blah | bdsai | bdha | 1
2 | blubber | blah | blah | adsb | 1
3 | blah | dsabsadsab | dnsa | dsa | 1
In this example, ID 2 must be at the top (2x blah, 1x blubber), then 1 (2x blah ) and then 3 (1x blah). This process should be dynamical so the user can also input more words and the relevance works same as with one or several words.
Is this possible to realize only in MySQL, or do I have to use some PHP? How would this work exactly?
Thank you very much for your help! Regards, Florian
EDIT: Here is the result after I tried the answer of Tom Mac:
I have four records which look like this:
ID | TITLE | DESCRIPTION | FILENAME | TAGS | ACTIVE
1 | s | s | s | s | 1
2 | 0 | fdsadf | sdfs | a,b,c,d,e,f,s,a,a,s,s,as,sada | 1
3 | 0 | s | s | s | 1
4 | a | a | a | a | 1
Now, if I search for the string s
, I should only get the top three records, ordered by a relevance of s. This means, the records should be orderer like this:
ID | TITLE | DESCRIPTION | FILENAME | TAGS | ACTIVE
2 | 0 | fdsadf | sdfs | a,b,c,d,e,f,s,a,a,s,s,as,sada | 1 <== 8x s
1 | s | s | s | s | 1 <== 4x s
3 | 0 | s | s | s | 1 <== 3x s
Now, I tried my query like this (the table's name is PAGES
):
select t . *
from (
select
match(title) against('*s*' in boolean mode)
+ match(description) against('*s*' in boolean mode)
+ match(filename) against('*s*' in boolean mode)
+ match(tags) against('*s*' in boolean mode)
as matchrank,
bb . *
from pages bb) t
where t.matchrank > 0
order by t.matchrank desc
This query returns this:
matchRank | ID | TITLE | DESCRIPTION | FILENAME | TAGS | ACTIVE
2 | 2 | 0 | fdsadf | sdfs | a,b,c,d,e,f,s,a,a,s,s,as,sada | 1
Is this because of the wildcards? I think, the string *s*
should also find a value which is only s
...
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
这可能会帮助你。它确实有点假设您的 MySQL 表使用 MyISAM 引擎:
编辑
该解决方案所做的另一个假设是您搜索的字符串长度 >= 4 个字符。如果“搜索”字符串(即“blubber”或“blah”)的长度可能为 1、2 或 3 个字符,那么您可以随时前往 my.cnf 文件并添加
ft_min_word_len=1< /code> 在
[mysqld]
配置选项下。然后重新启动 MySQL,就可以了。最后一件事:如果您正在考虑使用这种方法,那么您应该向每一列添加全文索引。因此:
您可以在 MySQL 文档中找到有关 BOOLEAN FULLTEXT 搜索的更多详细信息。
This might help you out. It does kinda assume that your MySQL table uses the MyISAM engine though:
EDIT
Another assumption that this solution makes is that the string that your searching for is >= 4 characters long. If there is a possibility that the 'search for' string i.e 'blubber' or 'blah' will be either 1, 2 or 3 characters long then you can always head to your my.cnf file and add
ft_min_word_len=1
under the[mysqld]
configuration options. Then restart MySQL and you should be good to go.One final thing: if you are considering using this approach then you should add a FULLTEXT INDEX to each of the columns. Hence:
You can find more details on BOOLEAN FULLTEXT searching in the MySQL Docs.
使用 Match() Against() 来确定分数,而不是“以布尔模式”搜索。将这些分数相加以获得相关性。
Rather than searching 'in boolean mode', use Match() Against() to determine a score. Add those scores up to get relevance.