优化多级 MySQL 子查询（大众分类法和分类法）

发布于 2024-08-23 13:43:41 字数 2350 浏览 10 评论 0原文

我正在阅读这篇精彩的标记文章作者：Nitin Borwankar，他让我开始思考如何使用两个表来实现不同级别的搜索。

tags {
  id,
  tag
}

post_tags {
  id
  user_id
  post_id
  tag_id
}

我从 T(U(i)) 的简单示例开始，这意味着拥有某个项目 i 的所有用户的所有标签。我能够使用以下 SQL 来完成此操作：

/* get all tags from the users found */
SELECT t.*, vt.* FROM verse_tags as vt
LEFT JOIN tags as t ON t.id = vt.tag_id
WHERE user_id in 
(
    /* Get all user_ids that have taged this item */
    SELECT user_id FROM verse_tags WHERE verse_id = 26046 GROUP BY user_id
)
GROUP BY t.id

然后我开始进行稍微困难的 +1 级深度查询。 T(U(T(u))) 这是使用 user # 等标签的用户的标签。

/* Then get the tags of the user with tags like the user 3 */
SELECT t.id FROM post_tags as pt
LEFT JOIN tags as t ON t.id = pt.tag_id
WHERE user_id in 
(
    /* Then get users with these tags */
    SELECT pt.user_id FROM post_tags as pt
    LEFT JOIN tags as t on t.id = pt.tag_id
    WHERE tag_id in
    (
        /* get tags of user */
        SELECT t.id FROM post_tags as pt
        LEFT JOIN tags as t ON t.id = pt.tag_id
        WHERE pt.user_id = 3
        GROUP BY t.id
    )
    GROUP BY user_id
)
GROUP BY t.id

然而，由于我通常在查询中使用 JOIN，所以我不确定如何优化这样的东西，或者在使用子查询时需要避免哪些设计缺陷。我什至读过应该使用 JOIN 来代替，但我不知道如何通过上述查询来实现这一点。

如何优化这些查询？

更新

1) 将 GROUP BY 替换为 SELECT DISTINCT。（.74 秒）

2) 将 WHERE in 替换为 WHERE isn't。（.40 秒）

3）添加索引（哎呀！）（0.09 秒）

4）返回到 WHERE in （0.08 秒）

EXPLAIN SELECT DISTINCT tag_id FROM post_tags WHERE user_id in
(
    SELECT DISTINCT user_id FROM post_tags WHERE tag_id in
    (
        SELECT DISTINCT tag_id FROM post_tags WHERE user_id = 3
    )
)

运行 EXPLAIN 给出以下结果：

id  select_type     table       type        possible_keys   key key_len ref rows    Extra
1   PRIMARY         post_tags   index       NULL        tag_id  4   NULL    14  Using where
2   DEPENDENT SUBQUERY  post_tags   index_subquery  user_id     user_id 4   func    1   Using where
3   DEPENDENT SUBQUERY  post_tags   index_subquery  user_id,tag_id  tag_id  4   func    1   Using where

原文

I was reading the great tagging article by Nitin Borwankar and he started me thinking of the ways to implement differnet levels of searches using two tables.

tags {
  id,
  tag
}

post_tags {
  id
  user_id
  post_id
  tag_id
}

I started with the simple example of T(U(i)) which means all tags of all users that have an item i. I was able to do it with the following SQL:

/* get all tags from the users found */
SELECT t.*, vt.* FROM verse_tags as vt
LEFT JOIN tags as t ON t.id = vt.tag_id
WHERE user_id in 
(
    /* Get all user_ids that have taged this item */
    SELECT user_id FROM verse_tags WHERE verse_id = 26046 GROUP BY user_id
)
GROUP BY t.id

Then I started with a slightly harder +1 level deep query. T(U(T(u))) which is tags of users using tags like user #.

/* Then get the tags of the user with tags like the user 3 */
SELECT t.id FROM post_tags as pt
LEFT JOIN tags as t ON t.id = pt.tag_id
WHERE user_id in 
(
    /* Then get users with these tags */
    SELECT pt.user_id FROM post_tags as pt
    LEFT JOIN tags as t on t.id = pt.tag_id
    WHERE tag_id in
    (
        /* get tags of user */
        SELECT t.id FROM post_tags as pt
        LEFT JOIN tags as t ON t.id = pt.tag_id
        WHERE pt.user_id = 3
        GROUP BY t.id
    )
    GROUP BY user_id
)
GROUP BY t.id

However, it since I normally use JOIN's in my queries I am not sure how something like this could be optimized or what design flaws need to be avoided when using subqueries. I have even read that JOIN's should be used instead, but I have no idea how this would be accomplished with the above queries.

How could these queries be optimized?

UPDATE

1) Replaced GROUP BY with SELECT DISTINCT. (.74 sec)

2) Replace WHERE in with WHERE exists. (.40 sec)

3) Added indexes (oops!) (0.09 sec)

4) Back to WHERE in (0.08 sec)

EXPLAIN SELECT DISTINCT tag_id FROM post_tags WHERE user_id in
(
    SELECT DISTINCT user_id FROM post_tags WHERE tag_id in
    (
        SELECT DISTINCT tag_id FROM post_tags WHERE user_id = 3
    )
)

Running EXPLAIN gives me these results:

id  select_type     table       type        possible_keys   key key_len ref rows    Extra
1   PRIMARY         post_tags   index       NULL        tag_id  4   NULL    14  Using where
2   DEPENDENT SUBQUERY  post_tags   index_subquery  user_id     user_id 4   func    1   Using where
3   DEPENDENT SUBQUERY  post_tags   index_subquery  user_id,tag_id  tag_id  4   func    1   Using where

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

浪菊怪哟 2024-08-30 13:43:41

根据我的说法，这是解决方案：

SELECT DISTINCT(`t`.`id`) FROM `post_tags` as `pt`
    left join `tags` as t on `t`.`id` = `pt`.`tag_id`
    where `pt`.`user_id` in(
        SELECT distinct(`pt`.`user_id`) FROM `post_tags` as `pt`
             LEFT JOIN `tags` as `t` on `t`.`id` = `pt`.`tag_id`
              WHERE `pt`.`tag_id` in(
                 SELECT distinct(`tag_id`) FROM `post_tags` 
                 WHERE pt.user_id = 3
            )
    )

According to me this is the solution:

SELECT DISTINCT(`t`.`id`) FROM `post_tags` as `pt`
    left join `tags` as t on `t`.`id` = `pt`.`tag_id`
    where `pt`.`user_id` in(
        SELECT distinct(`pt`.`user_id`) FROM `post_tags` as `pt`
             LEFT JOIN `tags` as `t` on `t`.`id` = `pt`.`tag_id`
              WHERE `pt`.`tag_id` in(
                 SELECT distinct(`tag_id`) FROM `post_tags` 
                 WHERE pt.user_id = 3
            )
    )

回复收藏 0 原文

~没有更多了~