Mysql 跨多个表的全文搜索相关性

发布于 2024-12-29 08:58:38 字数 191 浏览 3 评论 0原文

我的任务是创建一个站点范围的搜索功能。搜索需要查看文章、事件和页面内容

我之前在 MySQL 中使用过 MATCH()/AGAINST() 并知道如何获取结果的相关性,但据我所知,相关性对于搜索来说是唯一的(内容、行数等)文章表中结果的相关性与事件表中结果的相关性不匹配。

是否有办法统一相关性,以便所有三个表的结果具有可比较的相关性?

I have been tasked with creating a site wide search feature. The search needs to look at articles, events and page content

I've used MATCH()/AGAINST() in MySQL before and know how to get the relevance of a result but as far as I know the relevance is unique to the search (contents, number of rows etc) the relevance of results from the articles table wont match the relevance of results from the events table.

Is there anyway to unify the relevance so that results from all three tables have a comparable relevance?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

傾城如夢未必闌珊 2025-01-05 08:58:38

是的,您可以使用 Apache Lucene 和 Solr 等搜索引擎很好地统一它们。

http://lucene.apache.org/solr/

如果您只需要在 MySQL 中执行此操作,则可以使用一个联盟。您可能想要抑制任何零相关的结果。

您需要根据匹配的表来决定如何影响相关性。

例如,假设您希望文章最重要,事件中等重要,页面最不重要。您可以像这样使用乘数:

set @articles_multiplier=3;
set @events_multiplier=2;
set @pages_multiplier=1;

这是一个您可以尝试的工作示例,它演示了其中一些技术:

创建示例数据:

create database d;
use d;

create table articles (id int primary key, content text) ENGINE = MYISAM;
create table events (id int primary key, content text) ENGINE = MYISAM;
create table pages (id int primary key, content text) ENGINE = MYISAM;

insert into articles values 
(1, "Lorem ipsum dolor sit amet"),
(2, "consectetur adipisicing elit"),
(3, "sed do eiusmod tempor incididunt");

insert into events values 
(1, "Ut enim ad minim veniam"),
(2, "quis nostrud exercitation ullamco"),
(3, "laboris nisi ut aliquip");

insert into pages values 
(1, "Duis aute irure dolor in reprehenderit"),
(2, "in voluptate velit esse cillum"),
(3, "dolore eu fugiat nulla pariatur.");

使其可搜索:

ALTER TABLE articles ADD FULLTEXT(content);
ALTER TABLE events ADD FULLTEXT(content);
ALTER TABLE pages ADD FULLTEXT(content);

使用 UNION 搜索所有这些表:

set @target='dolor';

SELECT * from (
  SELECT 
    'articles' as 'table_name', id, 
    @articles_multiplier * (MATCH(content) AGAINST (@target)) as relevance
    from articles
  UNION
  SELECT 
    'events' as 'table_name', 
    id,
    @events_multiplier * (MATCH(content) AGAINST (@target)) as relevance
    from events
  UNION
  SELECT 
    'pages' as 'table_name', 
    id, 
    @pages_multiplier * (MATCH(content) AGAINST (@target)) as relevance
    from pages
)
as sitewide WHERE relevance > 0;

结果:

+------------+----+------------------+
| table_name | id | relevance        |
+------------+----+------------------+
| articles   |  1 | 1.98799377679825 |
| pages      |  3 | 0.65545331108093 |
+------------+----+------------------+

Yes, you can unify them very well using a search engine such as Apache Lucene and Solr.

http://lucene.apache.org/solr/

If you need to do it only in MySQL, you can do this with a UNION. You'll probably want to suppress any zero-relevant results.

You'll need to decide how you want to affect the relevance depending on which table matches.

For example, suppose you want articles to be most important, events to be medium important, and pages to be least important. You can use multipliers like this:

set @articles_multiplier=3;
set @events_multiplier=2;
set @pages_multiplier=1;

Here's a working example you can try that demonstrates some of these techniques:

Create sample data:

create database d;
use d;

create table articles (id int primary key, content text) ENGINE = MYISAM;
create table events (id int primary key, content text) ENGINE = MYISAM;
create table pages (id int primary key, content text) ENGINE = MYISAM;

insert into articles values 
(1, "Lorem ipsum dolor sit amet"),
(2, "consectetur adipisicing elit"),
(3, "sed do eiusmod tempor incididunt");

insert into events values 
(1, "Ut enim ad minim veniam"),
(2, "quis nostrud exercitation ullamco"),
(3, "laboris nisi ut aliquip");

insert into pages values 
(1, "Duis aute irure dolor in reprehenderit"),
(2, "in voluptate velit esse cillum"),
(3, "dolore eu fugiat nulla pariatur.");

Make it searchable:

ALTER TABLE articles ADD FULLTEXT(content);
ALTER TABLE events ADD FULLTEXT(content);
ALTER TABLE pages ADD FULLTEXT(content);

Use a UNION to search all these tables:

set @target='dolor';

SELECT * from (
  SELECT 
    'articles' as 'table_name', id, 
    @articles_multiplier * (MATCH(content) AGAINST (@target)) as relevance
    from articles
  UNION
  SELECT 
    'events' as 'table_name', 
    id,
    @events_multiplier * (MATCH(content) AGAINST (@target)) as relevance
    from events
  UNION
  SELECT 
    'pages' as 'table_name', 
    id, 
    @pages_multiplier * (MATCH(content) AGAINST (@target)) as relevance
    from pages
)
as sitewide WHERE relevance > 0;

The result:

+------------+----+------------------+
| table_name | id | relevance        |
+------------+----+------------------+
| articles   |  1 | 1.98799377679825 |
| pages      |  3 | 0.65545331108093 |
+------------+----+------------------+
活雷疯 2025-01-05 08:58:38

(抱歉,我想将此作为对上述答案的评论,但我没有足够的声誉来发表评论)

请注意,子查询中的 UNION 优化非常差。一种常见的情况是,当您想在父查询中使用“LIMIT @page * 10, 10”对结果进行分页时,MySQL 必须从子查询中获取所有结果才能评估父查询。

(Sorry, I want to leave this as comment to the above answer, but I dont have enough reputation to comment)

Be aware that UNION in subqueries are very poorly optimized. A frequently case is when you want to paginate your results using "LIMIT @page * 10, 10" in the parent query, then MySQL must get all the results from the subqueries in order to evaluate the parent query.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文