设计“基于相关性”的内容搜索?

发布于 2024-11-05 06:02:08 字数 428 浏览 6 评论 0原文

在我的应用程序(PHP/MySQL/JS)中,我内置了搜索功能。搜索条件之一包含各种选项的复选框,因此,某些结果将比其他结果更相关,如果它们包含更多或更少的内容每个选项。

即选项是A和B,如果我搜索选项A和B,则仅包含选项A的结果1是50%相关,而包含选项A和B的结果2是100%相关。

之前,我只是根据表单输入进行简单的 SQL 查询,但是这个有点难,因为它不像像“%query%”这样的数据那么简单,而是一些结果对某些搜索查询更有价值,有些则不然。

我完全不知道从哪里开始......有人有相关的(哈!)阅读材料指导我吗?

编辑:经过深思熟虑,我在想一些涉及 SQL 脚本来获取原始数据,然后进行许多轮解析的事情是我必须做的......

没有可缓存的,尽管? :(

In my application (PHP/MySQL/JS), I have a search functionality built in. One of the search criteria contains checkboxes for various options, and as such, some results would be more relevant than others, should they contain more or less of each option.

i.e. Options are A and B, and if I search for both options A and B, Result 1 containing only option A is 50% relevent, while Result 2 containing both options A and B is 100% relevant.

Prior, I'd just be doing simple SQL queries based on form input, but this one's a little harder, since it's not as simple as data LIKE "%query%", but rather, some results are more valuable to some search queries, and some aren't.

I have absolutely no idea where to begin... does anybody have relevant (ha!) reading material to direct me to?

Edit: After mulling it over, I'm thinking something involving an SQL script to get the raw data, followed by many many rounds of parsing is something I'd have to do...

Nothing cacheable, though? :(

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

心碎的声音 2024-11-12 06:02:08

看看 lucence 项目
它有多种语言版本,

这是 php 端口
http://framework.zend.com/manual/en/zend.search .lucene.html

它对要搜索的项目进行索引并返回相关的加权搜索结果,例如,比 select x from y where name like '%pattern%' 样式搜索更好

have a look at the lucence project
it is available in many languages

this is the php port
http://framework.zend.com/manual/en/zend.search.lucene.html

it indexes the items to search and returns the relevant weighted search results, eg better then select x from y where name like '%pattern%' style searching

紫﹏色ふ单纯 2024-11-12 06:02:08

你需要的是一个强大的搜索引擎,比如 solr。虽然您可以在 mysql 之上实现此功能,但它已经与其他工具一起提供了开箱即用的功能。

What you need is a powerful search engine, like solr. While you could implement this on top of mysql, it's already provided out of the box with other tools.

︶葆Ⅱㄣ 2024-11-12 06:02:08

这是一个想法:进行比较并对结果求和。总和越高,匹配的条件就越多。

像这样的(愚蠢的)表怎么样:

  • name
  • dob_year
  • dob_month
  • dob_day

查找与 3/15/1980 共享三个日期组件最多的人:

SELECT (dob_year = 1980) + (dob_month = 3) + (dob_day = 15) as strength, name
from user
order by strength desc
limit 1

需要一个好的 WHERE 子句和索引来防止您进行表扫描,但是...

您甚至可以为列添加权重,例如

SELECT ((dob_year = 1980)*2)

祝您好运。

Here's an idea: do the comparisons and sum the results. The higher the sum, the more criteria match.

How about a (stupid) table like this:

  • name
  • dob_year
  • dob_month
  • dob_day

Find the person who shares the most of the three date components with 3/15/1980:

SELECT (dob_year = 1980) + (dob_month = 3) + (dob_day = 15) as strength, name
from user
order by strength desc
limit 1

A good WHERE clause and index would be required to keep you from doing a table scan, but...

You could even add a weight to a column, e.g.

SELECT ((dob_year = 1980)*2)

Good luck.

野味少女 2024-11-12 06:02:08

鉴于您对我的评论的回答,这里有一个关于如何执行此操作的示例:

首先是表格:

CREATE TABLE `items` (
 `id` int(11) NOT NULL,
 `name` varchar(80) NOT NULL
);
CREATE TABLE `criteria` (
 `cid` int(11) NOT NULL,
 `option` varchar(80) NOT NULL,
 `value` int(1) NOT NULL
);

然后是一些项目和条件的示例:

INSERT INTO items (id, name) VALUES
(1,'Name1'),
(2,'Name2'),
(3,'Name3');

INSERT INTO criteria VALUES
(1,'option1',1) ,(1,'option2',1) ,(1,'option3',0),
(2,'option1',0) ,(2,'option2',1) ,(2,'option3',1),
(3,'option1',1) ,(3,'option2',0) ,(3,'option3',1);

这将创建 3 个项目和 3 个选项,并向它们分配选项。

现在有多种方式可以按一定的“强度”进行订购。最简单的是:

SELECT i . * , c1.value + c3.value AS strength
FROM items i
JOIN criteria c1 ON c1.cid = i.id AND c1.option = 'option1'
JOIN criteria c3 ON c3.cid = i.id AND c3.option = 'option3'
ORDER BY strength DESC 

这将向您显示具有选项 1 或选项 3 的所有项目,但同时具有这两个选项的项目似乎排名“更高”。

如果您要搜索 2 个选项,这会很有效。但是让我们假设您对所有 3 个选项进行搜索,这就是为什么为选项分配“权重”很重要,

但如果您的查询不这样做,这可能对您没有帮助。并不总是分配相同的这可以通过以下查询在每个查询的基础上轻松实现:

SELECT i.* , IF(c1.value, 2, 0) + IF(c3.value, 1, 0) AS strength
FROM items i
JOIN criteria c1 ON c1.cid = i.id AND c1.option = 'option1'
JOIN criteria c3 ON c3.cid = i.id AND c3.option = 'option3'
ORDER BY strength DESC

尝试查询,看看它是否是您所需要的

。我建议您添加索引,使选项字段为整数,

如果您有任何问题或需要添加的内容,请留下评论。

Given your answer to my comment, here's an example on how you might do it:

First the tables:

CREATE TABLE `items` (
 `id` int(11) NOT NULL,
 `name` varchar(80) NOT NULL
);
CREATE TABLE `criteria` (
 `cid` int(11) NOT NULL,
 `option` varchar(80) NOT NULL,
 `value` int(1) NOT NULL
);

Then an example of some items and criteria:

INSERT INTO items (id, name) VALUES
(1,'Name1'),
(2,'Name2'),
(3,'Name3');

INSERT INTO criteria VALUES
(1,'option1',1) ,(1,'option2',1) ,(1,'option3',0),
(2,'option1',0) ,(2,'option2',1) ,(2,'option3',1),
(3,'option1',1) ,(3,'option2',0) ,(3,'option3',1);

This would create 3 items and 3 options and assign options to them.

Now there are multiple way you can order by a certain "strength". The simplest of which would be:

SELECT i . * , c1.value + c3.value AS strength
FROM items i
JOIN criteria c1 ON c1.cid = i.id AND c1.option = 'option1'
JOIN criteria c3 ON c3.cid = i.id AND c3.option = 'option3'
ORDER BY strength DESC 

This would show you all the items that have option 1 or option 3 but those with both options would appear to be ranked "higher.

This works well if you're doing a search on 2 options. But let's assume you make a search on all 3 options. All the items now share the same strength, this is why it's important to assign "weights" to options.

You could make the value your strength, but that might not help you if your queries don't always assign the same weights to the same options everywhere. This can be easily achieved on a per-query basis with the following query:

SELECT i.* , IF(c1.value, 2, 0) + IF(c3.value, 1, 0) AS strength
FROM items i
JOIN criteria c1 ON c1.cid = i.id AND c1.option = 'option1'
JOIN criteria c3 ON c3.cid = i.id AND c3.option = 'option3'
ORDER BY strength DESC

Try the queries out and see if it's what you need.

I would also like to note that this is not the best solution in terms of processing power. I'd recommend you add indexes, make the option field an integer, cache results wherever possible.

Leave a comment if you have any questions or anything to add.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文