优化多级 MySQL 子查询(大众分类法和分类法)
我正在阅读这篇精彩的标记文章 作者:Nitin Borwankar,他让我开始思考如何使用两个表来实现不同级别的搜索。
tags {
id,
tag
}
post_tags {
id
user_id
post_id
tag_id
}
我从 T(U(i))
的简单示例开始,这意味着拥有某个项目 i 的所有用户的所有标签。我能够使用以下 SQL 来完成此操作:
/* get all tags from the users found */
SELECT t.*, vt.* FROM verse_tags as vt
LEFT JOIN tags as t ON t.id = vt.tag_id
WHERE user_id in
(
/* Get all user_ids that have taged this item */
SELECT user_id FROM verse_tags WHERE verse_id = 26046 GROUP BY user_id
)
GROUP BY t.id
然后我开始进行稍微困难的 +1 级深度查询。 T(U(T(u)))
这是使用 user # 等标签的用户的标签。
/* Then get the tags of the user with tags like the user 3 */
SELECT t.id FROM post_tags as pt
LEFT JOIN tags as t ON t.id = pt.tag_id
WHERE user_id in
(
/* Then get users with these tags */
SELECT pt.user_id FROM post_tags as pt
LEFT JOIN tags as t on t.id = pt.tag_id
WHERE tag_id in
(
/* get tags of user */
SELECT t.id FROM post_tags as pt
LEFT JOIN tags as t ON t.id = pt.tag_id
WHERE pt.user_id = 3
GROUP BY t.id
)
GROUP BY user_id
)
GROUP BY t.id
然而,由于我通常在查询中使用 JOIN,所以我不确定如何优化这样的东西,或者在使用子查询时需要避免哪些设计缺陷。我什至读过应该使用 JOIN 来代替,但我不知道如何通过上述查询来实现这一点。
如何优化这些查询?
更新
1) 将 GROUP BY
替换为 SELECT DISTINCT
。 (.74 秒)
2) 将 WHERE in
替换为 WHERE isn't
。 (.40 秒)
3)添加索引(哎呀!)(0.09 秒)
4)返回到 WHERE in
(0.08 秒)
EXPLAIN SELECT DISTINCT tag_id FROM post_tags WHERE user_id in
(
SELECT DISTINCT user_id FROM post_tags WHERE tag_id in
(
SELECT DISTINCT tag_id FROM post_tags WHERE user_id = 3
)
)
运行 EXPLAIN 给出以下结果:
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY post_tags index NULL tag_id 4 NULL 14 Using where
2 DEPENDENT SUBQUERY post_tags index_subquery user_id user_id 4 func 1 Using where
3 DEPENDENT SUBQUERY post_tags index_subquery user_id,tag_id tag_id 4 func 1 Using where
I was reading the great tagging article by Nitin Borwankar and he started me thinking of the ways to implement differnet levels of searches using two tables.
tags {
id,
tag
}
post_tags {
id
user_id
post_id
tag_id
}
I started with the simple example of T(U(i))
which means all tags of all users that have an item i. I was able to do it with the following SQL:
/* get all tags from the users found */
SELECT t.*, vt.* FROM verse_tags as vt
LEFT JOIN tags as t ON t.id = vt.tag_id
WHERE user_id in
(
/* Get all user_ids that have taged this item */
SELECT user_id FROM verse_tags WHERE verse_id = 26046 GROUP BY user_id
)
GROUP BY t.id
Then I started with a slightly harder +1 level deep query. T(U(T(u)))
which is tags of users using tags like user #.
/* Then get the tags of the user with tags like the user 3 */
SELECT t.id FROM post_tags as pt
LEFT JOIN tags as t ON t.id = pt.tag_id
WHERE user_id in
(
/* Then get users with these tags */
SELECT pt.user_id FROM post_tags as pt
LEFT JOIN tags as t on t.id = pt.tag_id
WHERE tag_id in
(
/* get tags of user */
SELECT t.id FROM post_tags as pt
LEFT JOIN tags as t ON t.id = pt.tag_id
WHERE pt.user_id = 3
GROUP BY t.id
)
GROUP BY user_id
)
GROUP BY t.id
However, it since I normally use JOIN's in my queries I am not sure how something like this could be optimized or what design flaws need to be avoided when using subqueries. I have even read that JOIN's should be used instead, but I have no idea how this would be accomplished with the above queries.
How could these queries be optimized?
UPDATE
1) Replaced GROUP BY
with SELECT DISTINCT
. (.74 sec)
2) Replace WHERE in
with WHERE exists
. (.40 sec)
3) Added indexes (oops!) (0.09 sec)
4) Back to WHERE in
(0.08 sec)
EXPLAIN SELECT DISTINCT tag_id FROM post_tags WHERE user_id in
(
SELECT DISTINCT user_id FROM post_tags WHERE tag_id in
(
SELECT DISTINCT tag_id FROM post_tags WHERE user_id = 3
)
)
Running EXPLAIN gives me these results:
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY post_tags index NULL tag_id 4 NULL 14 Using where
2 DEPENDENT SUBQUERY post_tags index_subquery user_id user_id 4 func 1 Using where
3 DEPENDENT SUBQUERY post_tags index_subquery user_id,tag_id tag_id 4 func 1 Using where
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
根据我的说法,这是解决方案:
According to me this is the solution: