选择具有匹配标签的所有项目
我正在努力寻找最有效的方法来处理这个问题,但我必须先告诉你,我把事情搞砸了。环顾四周,发现没有任何相关的东西,所以就这样了。
如何选择与所需项目具有相似标签的所有项目?
以此表为例:
(用于重新创建表的 SQL 代码如下)
project 1 -> tagA | tagB | tagC
project 2 -> tagA | tagB
project 3 -> tagA
project 4 -> tagC
选择项目 1 应返回所有项目。
选择项目 4 应该只返回项目项目 1
到目前为止,我的查询非常依赖于左连接,并且肯定有更好的方法来执行此操作:
SELECT all_tags.project_id, all_tags.tag_id, final.title, tag.tag
FROM projects AS p
LEFT JOIN projects_to_tags AS pt ON p.num = pt.project_id
LEFT JOIN projects_to_tags AS all_tags ON pt.tag_id = all_tags.tag_id
LEFT JOIN projects AS final ON all_tags.project_id = final.num
LEFT JOIN tags AS tag ON all_tags.tag_id = tag.tag_id
WHERE p.num = 4
GROUP BY final.num
谢谢大家的输入。我想与大家分享 100k 个项目数据库、100k 个标签数据库以及 100k 个projects_to_tags 关系的所有查询的平均结果。所有查询均更改为询问project_1。
甜蜜而简短:
0.0160 sec - OMG Ponies - Using JOINS
0.0208 sec - jdelard
0.2581 sec - OMG Ponies - Using EXISTS
0.2777 sec - OMG Ponies - Using IN
0.5295 sec - Emtucifor - updated query
0.5088 sec - Emtucifor - first query
非常感谢大家。将相应地更新我的所有查询。
这里是所有查询和相应的 MySQL EXPLAIN 以及时间的
===============================================================================================================================================
Emtucifor - updated query
===============================================================================================================================================
Showing rows 0 - 1 (2 total, Query took 0.5295 sec)
SELECT *
FROM projects AS L
WHERE L.num !=1-- instead of <> PT2.project_id inside
AND EXISTS (
SELECT 1
FROM projects_to_tags PT
INNER JOIN projects_to_tags PT2 ON PT.tag_id = PT2.tag_id
WHERE L.num = PT.project_id
AND PT2.project_id =1
)
LIMIT 0 , 30
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY L ALL PRIMARY NULL NULL NULL 100000 Using where
2 DEPENDENT SUBQUERY PT2 ref project_id project_id 4 const 1 Using index
2 DEPENDENT SUBQUERY PT ref project_id project_id 8 test.L.num,test.PT2.tag_id 12000 Using index
===============================================================================================================================================
Emtucifor - first query
===============================================================================================================================================
Showing rows 0 - 1 (2 total, Query took 0.5088 sec)
SELECT *
FROM projects AS L
WHERE
EXISTS (
SELECT 1
FROM projects_to_tags PT
INNER JOIN projects_to_tags PT2 ON PT.tag_id = PT2.tag_id
WHERE L.num = PT.project_id
AND PT2.project_id =1
AND PT2.project_id <> L.num
)
LIMIT 0 , 30
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY L ALL NULL NULL NULL NULL 100000 Using where
2 DEPENDENT SUBQUERY PT2 ref project_id project_id 4 const 1 Using index
2 DEPENDENT SUBQUERY PT ref project_id project_id 8 test.L.num,test.PT2.tag_id 12000 Using where; Using index
===============================================================================================================================================
jdelard
===============================================================================================================================================
Showing rows 0 - 1 (2 total, Query took 0.0208 sec)
SELECT p.num, p.title
FROM projects_to_tags pt1, projects_to_tags pt2, projects p
WHERE pt1.project_id =1
AND pt2.project_id !=1
AND pt1.tag_id = pt2.tag_id
AND p.num = pt2.project_id
GROUP BY pt2.project_id
LIMIT 0 , 30
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE pt1 ref project_id project_id 4 const 1 Using index; Using temporary; Using filesort
1 SIMPLE pt2 index project_id project_id 8 NULL 75001 Using where; Using index
1 SIMPLE p eq_ref PRIMARY PRIMARY 4 test.pt2.project_id 1
===============================================================================================================================================
OMG Ponies - Using IN
===============================================================================================================================================
Showing rows 0 - 2 (3 total, Query took 0.2777 sec)
SELECT p . *
FROM projects p
JOIN projects_to_tags pt ON pt.project_id = p.num
WHERE pt.tag_id
IN (
SELECT x.tag_id
FROM projects_to_tags x
WHERE x.project_id =1
)
LIMIT 0 , 30
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY pt index project_id project_id 8 NULL 100001 Using where; Using index
1 PRIMARY p eq_ref PRIMARY PRIMARY 4 test.pt.project_id 1
2 DEPENDENT SUBQUERY x ref project_id project_id 8 const,func 12000 Using where; Using index
===============================================================================================================================================
OMG Ponies - Using EXISTS
===============================================================================================================================================
Showing rows 0 - 2 (3 total, Query took 0.2581 sec)
SELECT p . *
FROM projects p
JOIN projects_to_tags pt ON pt.project_id = p.num
WHERE EXISTS (
SELECT NULL
FROM projects_to_tags x
WHERE x.project_id = 1
AND x.tag_id = pt.tag_id
)
LIMIT 0 , 30
===============================================================================================================================================
OMG Ponies - Using JOINS
===============================================================================================================================================
Showing rows 0 - 2 (3 total, Query took 0.0160 sec)
SELECT DISTINCT p . *
FROM projects p
JOIN projects_to_tags pt ON pt.project_id = p.num
JOIN projects_to_tags x ON x.tag_id = pt.tag_id
AND x.project_id = 1
LIMIT 0 , 30
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE x ref project_id project_id 4 const 1 Using index; Using temporary
1 SIMPLE pt index project_id project_id 8 NULL 75001 Using where; Using index
1 SIMPLE p eq_ref PRIMARY PRIMARY 4 test.pt.project_id 1
SQL 代码,用于复制/粘贴和混乱。
CREATE TABLE IF NOT EXISTS `projects` (
`num` int(2) NOT NULL auto_increment,
`title` varchar(30) NOT NULL,
PRIMARY KEY (`num`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=5 ;
INSERT INTO `projects` (`num`, `title`) VALUES(1, 'project 1'),(2, 'project 2'),(3, 'project 3'),(4, 'project 4');
CREATE TABLE IF NOT EXISTS `projects_to_tags` (
`project_id` int(2) NOT NULL,
`tag_id` int(2) NOT NULL,
KEY `project_id` (`project_id`,`tag_id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
INSERT INTO `projects_to_tags` (`project_id`, `tag_id`) VALUES(1, 1),(1, 2),(1, 3),(2, 1),(2, 2),(3, 1),(4, 3);
CREATE TABLE IF NOT EXISTS `tags` (
`tag_id` int(2) NOT NULL auto_increment,
`tag` varchar(30) NOT NULL,
PRIMARY KEY (`tag_id`),
UNIQUE KEY `tag` (`tag`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=4 ;
INSERT INTO `tags` (`tag_id`, `tag`) VALUES(1, 'tag a'),(2, 'tag b'),(3, 'tag c');
I'm trying to find the most efficient way of dealing with this but I must tell you front-head I've made a mess of it. Looked around SO and found nothing of relevance so here it goes.
How to select all projects that have similar tags to the desired project?
Take this table for example:
(sql code to recreate tables bellow)
project 1 -> tagA | tagB | tagC
project 2 -> tagA | tagB
project 3 -> tagA
project 4 -> tagC
Selecting project 1 should return back all projects.
Selecting project 4 should only return project project 1
My query so far is pretty dependant of left joins and for sure there is a better way to do this:
SELECT all_tags.project_id, all_tags.tag_id, final.title, tag.tag
FROM projects AS p
LEFT JOIN projects_to_tags AS pt ON p.num = pt.project_id
LEFT JOIN projects_to_tags AS all_tags ON pt.tag_id = all_tags.tag_id
LEFT JOIN projects AS final ON all_tags.project_id = final.num
LEFT JOIN tags AS tag ON all_tags.tag_id = tag.tag_id
WHERE p.num = 4
GROUP BY final.num
Thank you all for the input. I though I'd share with you guys the average results of all the queries on a 100k projects database, 100k tags database with a 100k projects_to_tags relation. All queries were changed to ask for project_1.
The sweet and short:
0.0160 sec - OMG Ponies - Using JOINS
0.0208 sec - jdelard
0.2581 sec - OMG Ponies - Using EXISTS
0.2777 sec - OMG Ponies - Using IN
0.5295 sec - Emtucifor - updated query
0.5088 sec - Emtucifor - first query
Thank you all very much for this. Gonna update ALL my queries accordingly.
Here go all queries and respective MySQL EXPLAIN's along with time's
===============================================================================================================================================
Emtucifor - updated query
===============================================================================================================================================
Showing rows 0 - 1 (2 total, Query took 0.5295 sec)
SELECT *
FROM projects AS L
WHERE L.num !=1-- instead of <> PT2.project_id inside
AND EXISTS (
SELECT 1
FROM projects_to_tags PT
INNER JOIN projects_to_tags PT2 ON PT.tag_id = PT2.tag_id
WHERE L.num = PT.project_id
AND PT2.project_id =1
)
LIMIT 0 , 30
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY L ALL PRIMARY NULL NULL NULL 100000 Using where
2 DEPENDENT SUBQUERY PT2 ref project_id project_id 4 const 1 Using index
2 DEPENDENT SUBQUERY PT ref project_id project_id 8 test.L.num,test.PT2.tag_id 12000 Using index
===============================================================================================================================================
Emtucifor - first query
===============================================================================================================================================
Showing rows 0 - 1 (2 total, Query took 0.5088 sec)
SELECT *
FROM projects AS L
WHERE
EXISTS (
SELECT 1
FROM projects_to_tags PT
INNER JOIN projects_to_tags PT2 ON PT.tag_id = PT2.tag_id
WHERE L.num = PT.project_id
AND PT2.project_id =1
AND PT2.project_id <> L.num
)
LIMIT 0 , 30
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY L ALL NULL NULL NULL NULL 100000 Using where
2 DEPENDENT SUBQUERY PT2 ref project_id project_id 4 const 1 Using index
2 DEPENDENT SUBQUERY PT ref project_id project_id 8 test.L.num,test.PT2.tag_id 12000 Using where; Using index
===============================================================================================================================================
jdelard
===============================================================================================================================================
Showing rows 0 - 1 (2 total, Query took 0.0208 sec)
SELECT p.num, p.title
FROM projects_to_tags pt1, projects_to_tags pt2, projects p
WHERE pt1.project_id =1
AND pt2.project_id !=1
AND pt1.tag_id = pt2.tag_id
AND p.num = pt2.project_id
GROUP BY pt2.project_id
LIMIT 0 , 30
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE pt1 ref project_id project_id 4 const 1 Using index; Using temporary; Using filesort
1 SIMPLE pt2 index project_id project_id 8 NULL 75001 Using where; Using index
1 SIMPLE p eq_ref PRIMARY PRIMARY 4 test.pt2.project_id 1
===============================================================================================================================================
OMG Ponies - Using IN
===============================================================================================================================================
Showing rows 0 - 2 (3 total, Query took 0.2777 sec)
SELECT p . *
FROM projects p
JOIN projects_to_tags pt ON pt.project_id = p.num
WHERE pt.tag_id
IN (
SELECT x.tag_id
FROM projects_to_tags x
WHERE x.project_id =1
)
LIMIT 0 , 30
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY pt index project_id project_id 8 NULL 100001 Using where; Using index
1 PRIMARY p eq_ref PRIMARY PRIMARY 4 test.pt.project_id 1
2 DEPENDENT SUBQUERY x ref project_id project_id 8 const,func 12000 Using where; Using index
===============================================================================================================================================
OMG Ponies - Using EXISTS
===============================================================================================================================================
Showing rows 0 - 2 (3 total, Query took 0.2581 sec)
SELECT p . *
FROM projects p
JOIN projects_to_tags pt ON pt.project_id = p.num
WHERE EXISTS (
SELECT NULL
FROM projects_to_tags x
WHERE x.project_id = 1
AND x.tag_id = pt.tag_id
)
LIMIT 0 , 30
===============================================================================================================================================
OMG Ponies - Using JOINS
===============================================================================================================================================
Showing rows 0 - 2 (3 total, Query took 0.0160 sec)
SELECT DISTINCT p . *
FROM projects p
JOIN projects_to_tags pt ON pt.project_id = p.num
JOIN projects_to_tags x ON x.tag_id = pt.tag_id
AND x.project_id = 1
LIMIT 0 , 30
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE x ref project_id project_id 4 const 1 Using index; Using temporary
1 SIMPLE pt index project_id project_id 8 NULL 75001 Using where; Using index
1 SIMPLE p eq_ref PRIMARY PRIMARY 4 test.pt.project_id 1
SQL code to copy/paste and mess around.
CREATE TABLE IF NOT EXISTS `projects` (
`num` int(2) NOT NULL auto_increment,
`title` varchar(30) NOT NULL,
PRIMARY KEY (`num`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=5 ;
INSERT INTO `projects` (`num`, `title`) VALUES(1, 'project 1'),(2, 'project 2'),(3, 'project 3'),(4, 'project 4');
CREATE TABLE IF NOT EXISTS `projects_to_tags` (
`project_id` int(2) NOT NULL,
`tag_id` int(2) NOT NULL,
KEY `project_id` (`project_id`,`tag_id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
INSERT INTO `projects_to_tags` (`project_id`, `tag_id`) VALUES(1, 1),(1, 2),(1, 3),(2, 1),(2, 2),(3, 1),(4, 3);
CREATE TABLE IF NOT EXISTS `tags` (
`tag_id` int(2) NOT NULL auto_increment,
`tag` varchar(30) NOT NULL,
PRIMARY KEY (`tag_id`),
UNIQUE KEY `tag` (`tag`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=4 ;
INSERT INTO `tags` (`tag_id`, `tag`) VALUES(1, 'tag a'),(2, 'tag b'),(3, 'tag c');
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
在以下任何一种情况下,如果您不知道
PROJECT.num
/PROJECT_TO_TAGS.project_id
,则必须加入PROJECTS
code> table 来获取 id 值,以找出它关联的标签。使用 IN
使用 EXISTS
使用 JOINS(这是最有效的!)
DISTINCT 是必要的,因为 JOIN 存在在结果集中出现重复数据的风险...
In any of the following cases, if you don't know the
PROJECT.num
/PROJECT_TO_TAGS.project_id
, you'll have to join to thePROJECTS
table to get the id value for finding out what tags it has associated.Using IN
Using EXISTS
Using JOINS (this the most efficient one!)
The
DISTINCT
is necessary because JOINs risk duplicated data turning up in the resultset...怎么样...(项目 1 的示例)
也许可以在projects_to_tags 中为tag_id 添加一个单独的索引,这样您就可以单独使用它,而不是组合使用它。不再需要输入 ALL。 (表格扫描)
将 1 替换为 4 也能得到所需的结果。
How about... (example for project 1)
And maybe add a separate index for tag_id in projects_to_tags so you can use it alone, instead of the composite. No more type ALL. (Table Scan)
Replacing both 1 with 4 give also the desired results.
像这样的东西...?
这是 2 次搜索和一次扫描。
更新
从 jdelard 的书中摘取一页,一个微小的修改使我的查询优于他的查询(当然我是在 SQL Server 上执行此操作,这意味着我取出了他的 GROUP BY 并放入了 DISTINCT,所以 YMMV在 MySQL 上):
对其查询的改进来自于不执行 DISTINCT 或聚合,并使用半连接而不是完全连接,因此不必连接每一行。否则,在语义上它们基本上是相同的。
我必须记住 jdelard 的技巧,因为它是一个非常有用的工具。由于某种原因,查询引擎不够智能,无法计算给定的 {a = 4, a != b} 然后 {b != 4}。
Something like this... ?
That's 2 seeks and a scan.
UPDATE
Taking a page from jdelard's book, one tiny modification switches my query to outperform his (of course I'm doing this on SQL Server meaning I took out his GROUP BY and put in a DISTINCT, so YMMV on MySQL):
The improvement over his query comes from not doing a DISTINCT or aggregate, and using a semi join instead of a complete join so not every row has to be joined. Otherwise, semantically they are largely the same.
I will have to remember jdelard's trick as it is a very useful tool. For some reason the query engine was not smart enough to compute that given {a = 4, a != b} then {b != 4}.