关于优化慢查询的问题（包括 SQL）

发布于 2024-11-18 05:35:50 字数 2516 浏览 10 评论 0原文

SELECT DISTINCT "myapp_profile"."user_id", "myapp_profile"."name", 
  "myapp_profile"."age", "auth_user"."id", "auth_user"."username", 
  "auth_user"."first_name", "auth_user"."last_name", "auth_user"."email", 
  "auth_user"."password", "auth_user"."is_staff", "auth_user"."is_active", 
  "auth_user"."is_superuser", "auth_user"."last_login", "auth_user"."date_joined" 
FROM "myapp_profile" 
INNER JOIN "auth_user" ON ("myapp_profile"."user_id" = "auth_user"."id") 
LEFT OUTER JOIN "myapp_siterel" ON ("myapp_profile"."user_id" = "myapp_siterel"."profile_id") 
LEFT OUTER JOIN "django_site" ON ("myapp_siterel"."site_id" = "django_site"."id") 
WHERE ("auth_user"."is_superuser" = false 
AND "auth_user"."is_staff" = false 
AND ("django_site"."id" IS NULL OR "django_site"."id" IN (15, 16))) 
ORDER BY "myapp_profile"."user_id" 
DESC LIMIT 100

对于 200 万个用户/配置文件，上述查询运行大约需要 100 秒。我不是 DBA，我们的 DBA 正在研究这种情况，看看可以做什么，但由于我可能永远不会看到发生了什么变化（假设它发生在数据库级别），我很好奇你可以如何优化这个查询。显然，它的发生速度需要比实际发生的速度快得多，例如 5 秒或更短的时间。如果无法优化 SQL，是否可以添加/更改一个或多个索引以使查询更快，或者是否还有其他我忽略的内容？

Postgres 9 是数据库，Django 的 ORM 是该查询的来源。

查询计划

Limit (cost=1374.35..1383.10 rows=100 width=106)
-> Unique (cost=1374.35..1391.24 rows=193 width=106)
-> Sort (cost=1374.35..1374.83 rows=193 width=106)
Sort Key: myapp_profile.user_id, myapp_profile.name, myapp_profile.age, auth_user.username, auth_user.first_name, auth_user.last_name, auth_user.email, auth_user.password, auth_user.is_staff, auth_user.is_active, auth_user.is_superuser, auth_user.last_login, auth_user.date_joined
-> Nested Loop (cost=453.99..1367.02 rows=193 width=106)
-> Hash Left Join (cost=453.99..1302.53 rows=193 width=49)
Hash Cond: (myapp_siterel.site_id = django_site.id)
Filter: ((django_site.id IS NULL) OR (django_site.id = ANY ('{10080,10053}'::integer[])))
-> Hash Left Join (cost=448.50..1053.27 rows=15001 width=53)
Hash Cond: (myapp_profile.user_id = myapp_siterel.profile_id)
-> Seq Scan on myapp_profile (cost=0.00..286.01 rows=15001 width=49)
-> Hash (cost=261.00..261.00 rows=15000 width=8)
-> Seq Scan on myapp_siterel (cost=0.00..261.00 rows=15000 width=8)
-> Hash (cost=3.55..3.55 rows=155 width=4)
-> Seq Scan on django_site (cost=0.00..3.55 rows=155 width=4)
-> Index Scan using auth_user_pkey on auth_user (cost=0.00..0.32 rows=1 width=57)
Index Cond: (auth_user.id = myapp_profile.user_id)
Filter: ((NOT auth_user.is_superuser) AND (NOT auth_user.is_staff))

谢谢

原文

SELECT DISTINCT "myapp_profile"."user_id", "myapp_profile"."name", 
  "myapp_profile"."age", "auth_user"."id", "auth_user"."username", 
  "auth_user"."first_name", "auth_user"."last_name", "auth_user"."email", 
  "auth_user"."password", "auth_user"."is_staff", "auth_user"."is_active", 
  "auth_user"."is_superuser", "auth_user"."last_login", "auth_user"."date_joined" 
FROM "myapp_profile" 
INNER JOIN "auth_user" ON ("myapp_profile"."user_id" = "auth_user"."id") 
LEFT OUTER JOIN "myapp_siterel" ON ("myapp_profile"."user_id" = "myapp_siterel"."profile_id") 
LEFT OUTER JOIN "django_site" ON ("myapp_siterel"."site_id" = "django_site"."id") 
WHERE ("auth_user"."is_superuser" = false 
AND "auth_user"."is_staff" = false 
AND ("django_site"."id" IS NULL OR "django_site"."id" IN (15, 16))) 
ORDER BY "myapp_profile"."user_id" 
DESC LIMIT 100

The above query takes about 100 seconds to run with 2 million users/profiles. I'm no DBA and our DBAs are looking at the situation to see what can be done, but since I'll likely never get to see what changes (assuming it happens at the DB level), I'm curious how you could optimized this query. It obviously needs to happen a ton faster than it is happening, like on the order of 5 seconds or less. If there is no way to optimize the SQL, is there an index or indexes you could add/change to make the query it quicker, or is there anything something else I'm overlooking?

Postgres 9 is the DB, and Django's ORM is where this query came from.

Query Plan

Limit (cost=1374.35..1383.10 rows=100 width=106)
-> Unique (cost=1374.35..1391.24 rows=193 width=106)
-> Sort (cost=1374.35..1374.83 rows=193 width=106)
Sort Key: myapp_profile.user_id, myapp_profile.name, myapp_profile.age, auth_user.username, auth_user.first_name, auth_user.last_name, auth_user.email, auth_user.password, auth_user.is_staff, auth_user.is_active, auth_user.is_superuser, auth_user.last_login, auth_user.date_joined
-> Nested Loop (cost=453.99..1367.02 rows=193 width=106)
-> Hash Left Join (cost=453.99..1302.53 rows=193 width=49)
Hash Cond: (myapp_siterel.site_id = django_site.id)
Filter: ((django_site.id IS NULL) OR (django_site.id = ANY ('{10080,10053}'::integer[])))
-> Hash Left Join (cost=448.50..1053.27 rows=15001 width=53)
Hash Cond: (myapp_profile.user_id = myapp_siterel.profile_id)
-> Seq Scan on myapp_profile (cost=0.00..286.01 rows=15001 width=49)
-> Hash (cost=261.00..261.00 rows=15000 width=8)
-> Seq Scan on myapp_siterel (cost=0.00..261.00 rows=15000 width=8)
-> Hash (cost=3.55..3.55 rows=155 width=4)
-> Seq Scan on django_site (cost=0.00..3.55 rows=155 width=4)
-> Index Scan using auth_user_pkey on auth_user (cost=0.00..0.32 rows=1 width=57)
Index Cond: (auth_user.id = myapp_profile.user_id)
Filter: ((NOT auth_user.is_superuser) AND (NOT auth_user.is_staff))

Thanks

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

锦上情书 2024-11-25 05:35:50

我对 postgres 不太熟悉，所以我不确定它的查询优化器有多好，但看起来 where 子句中的所有内容都可以是连接条件，尽管我希望 postgres 足够聪明来工作它自己就出来了，但是如果不是，那么它将获取所有 200 万用户以及其他 3 个表中的相关记录，然后使用您的 where 进行过滤。

如果已经提到的索引尚不存在，那么它们也应该适合您。再说一遍，我更喜欢 MSSQL，但是 postgres 没有任何可以看到的统计配置文件或查询计划吗？

东西

SELECT DISTINCT
    "myapp_profile"."user_id",
    "myapp_profile"."name", 
    "myapp_profile"."age",
    "auth_user"."id",
    "auth_user"."username", 
    "auth_user"."first_name",
    "auth_user"."last_name",
    "auth_user"."email", 
    "auth_user"."password",
    "auth_user"."is_staff",
    "auth_user"."is_active", 
    "auth_user"."is_superuser",
    "auth_user"."last_login",
    "auth_user"."date_joined" 
FROM "myapp_profile" 
    INNER JOIN "auth_user"
        ON ("myapp_profile"."user_id" = "auth_user"."id") 
        AND "auth_user"."is_superuser" = false
        AND "auth_user"."is_staff" = false 
    LEFT OUTER JOIN "myapp_siterel"
        ON ("myapp_profile"."user_id" = "myapp_siterel"."profile_id") 
    LEFT OUTER JOIN "django_site"
        ON ("myapp_siterel"."site_id" = "django_site"."id") 
        AND ("django_site"."id" IS NULL OR "django_site"."id" IN (15, 16))
ORDER BY "myapp_profile"."user_id" DESC
LIMIT 100

另外，您需要独特的吗？这也会稍微减慢速度。

I'm not so familiar with postgres, so I'm not sure how good it's query optimiser is, but it looks like everything you have in the where clause could instead be join conditions, although I'd hope postgres is clever enough to work that out for itself, however if it's not then it's going to fetch all your 2 million users with related records in the other 3 tables and then filter that using your where.

The indexes already mentioned should also work for you if they don't already exist. Again i'm more an MSSQL person but does postgres not have any statistics profile or query plan you can see?

Something along these lines

SELECT DISTINCT
    "myapp_profile"."user_id",
    "myapp_profile"."name", 
    "myapp_profile"."age",
    "auth_user"."id",
    "auth_user"."username", 
    "auth_user"."first_name",
    "auth_user"."last_name",
    "auth_user"."email", 
    "auth_user"."password",
    "auth_user"."is_staff",
    "auth_user"."is_active", 
    "auth_user"."is_superuser",
    "auth_user"."last_login",
    "auth_user"."date_joined" 
FROM "myapp_profile" 
    INNER JOIN "auth_user"
        ON ("myapp_profile"."user_id" = "auth_user"."id") 
        AND "auth_user"."is_superuser" = false
        AND "auth_user"."is_staff" = false 
    LEFT OUTER JOIN "myapp_siterel"
        ON ("myapp_profile"."user_id" = "myapp_siterel"."profile_id") 
    LEFT OUTER JOIN "django_site"
        ON ("myapp_siterel"."site_id" = "django_site"."id") 
        AND ("django_site"."id" IS NULL OR "django_site"."id" IN (15, 16))
ORDER BY "myapp_profile"."user_id" DESC
LIMIT 100

Also, do you need the distinct? That'll also slow it down somewhat.

回复收藏 0 原文

本宫微胖 2024-11-25 05:35:50

基础知识：

确保所有用户 ID 字段都已建立索引。

看起来你也可以在 is_supervisor 和 is_staff 上建立索引，效果很好

回复收藏 0 原文

梦回梦里 2024-11-25 05:35:50

对于查询优化，从来没有直接的灵丹妙药解决方案，但是，明显的步骤是对您正在搜索的列进行索引，在您的情况下，即：

"auth_user"."is_superuser"
"auth_user"."is_staff"
"django_site"."id"
"myapp_profile"."user_id"

there's never a straight forward silver-bullet solution for query optimization, however, the obvious steps is to index columns you're searching on, in your case, that's:

"auth_user"."is_superuser"
"auth_user"."is_staff"
"django_site"."id"
"myapp_profile"."user_id"

回复收藏 0 原文

~没有更多了~

关于作者

半夏半凉

暂无简介

文章

27 人气

关注发私信

友情链接

文江博客

关于优化慢查询的问题（包括 SQL）

查询计划

Query Plan

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（3）

关于作者

相关话题

热门标签

推荐作者

著墨染雨君画夕

屋檐

最后的乘客

眼前雾蒙蒙

kidking

kill136

友情链接

关于优化慢查询的问题（包括 SQL）

查询计划

Query Plan

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（3）

关于作者

相关话题

热门标签

推荐作者

著墨染雨君画夕

屋檐

最后的乘客

眼前雾蒙蒙

kidking

kill136

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。