关于优化慢查询的问题(包括 SQL)
SELECT DISTINCT "myapp_profile"."user_id", "myapp_profile"."name",
"myapp_profile"."age", "auth_user"."id", "auth_user"."username",
"auth_user"."first_name", "auth_user"."last_name", "auth_user"."email",
"auth_user"."password", "auth_user"."is_staff", "auth_user"."is_active",
"auth_user"."is_superuser", "auth_user"."last_login", "auth_user"."date_joined"
FROM "myapp_profile"
INNER JOIN "auth_user" ON ("myapp_profile"."user_id" = "auth_user"."id")
LEFT OUTER JOIN "myapp_siterel" ON ("myapp_profile"."user_id" = "myapp_siterel"."profile_id")
LEFT OUTER JOIN "django_site" ON ("myapp_siterel"."site_id" = "django_site"."id")
WHERE ("auth_user"."is_superuser" = false
AND "auth_user"."is_staff" = false
AND ("django_site"."id" IS NULL OR "django_site"."id" IN (15, 16)))
ORDER BY "myapp_profile"."user_id"
DESC LIMIT 100
对于 200 万个用户/配置文件,上述查询运行大约需要 100 秒。我不是 DBA,我们的 DBA 正在研究这种情况,看看可以做什么,但由于我可能永远不会看到发生了什么变化(假设它发生在数据库级别),我很好奇你可以如何优化这个查询。显然,它的发生速度需要比实际发生的速度快得多,例如 5 秒或更短的时间。如果无法优化 SQL,是否可以添加/更改一个或多个索引以使查询更快,或者是否还有其他我忽略的内容?
Postgres 9 是数据库,Django 的 ORM 是该查询的来源。
查询计划
Limit (cost=1374.35..1383.10 rows=100 width=106)
-> Unique (cost=1374.35..1391.24 rows=193 width=106)
-> Sort (cost=1374.35..1374.83 rows=193 width=106)
Sort Key: myapp_profile.user_id, myapp_profile.name, myapp_profile.age, auth_user.username, auth_user.first_name, auth_user.last_name, auth_user.email, auth_user.password, auth_user.is_staff, auth_user.is_active, auth_user.is_superuser, auth_user.last_login, auth_user.date_joined
-> Nested Loop (cost=453.99..1367.02 rows=193 width=106)
-> Hash Left Join (cost=453.99..1302.53 rows=193 width=49)
Hash Cond: (myapp_siterel.site_id = django_site.id)
Filter: ((django_site.id IS NULL) OR (django_site.id = ANY ('{10080,10053}'::integer[])))
-> Hash Left Join (cost=448.50..1053.27 rows=15001 width=53)
Hash Cond: (myapp_profile.user_id = myapp_siterel.profile_id)
-> Seq Scan on myapp_profile (cost=0.00..286.01 rows=15001 width=49)
-> Hash (cost=261.00..261.00 rows=15000 width=8)
-> Seq Scan on myapp_siterel (cost=0.00..261.00 rows=15000 width=8)
-> Hash (cost=3.55..3.55 rows=155 width=4)
-> Seq Scan on django_site (cost=0.00..3.55 rows=155 width=4)
-> Index Scan using auth_user_pkey on auth_user (cost=0.00..0.32 rows=1 width=57)
Index Cond: (auth_user.id = myapp_profile.user_id)
Filter: ((NOT auth_user.is_superuser) AND (NOT auth_user.is_staff))
谢谢
SELECT DISTINCT "myapp_profile"."user_id", "myapp_profile"."name",
"myapp_profile"."age", "auth_user"."id", "auth_user"."username",
"auth_user"."first_name", "auth_user"."last_name", "auth_user"."email",
"auth_user"."password", "auth_user"."is_staff", "auth_user"."is_active",
"auth_user"."is_superuser", "auth_user"."last_login", "auth_user"."date_joined"
FROM "myapp_profile"
INNER JOIN "auth_user" ON ("myapp_profile"."user_id" = "auth_user"."id")
LEFT OUTER JOIN "myapp_siterel" ON ("myapp_profile"."user_id" = "myapp_siterel"."profile_id")
LEFT OUTER JOIN "django_site" ON ("myapp_siterel"."site_id" = "django_site"."id")
WHERE ("auth_user"."is_superuser" = false
AND "auth_user"."is_staff" = false
AND ("django_site"."id" IS NULL OR "django_site"."id" IN (15, 16)))
ORDER BY "myapp_profile"."user_id"
DESC LIMIT 100
The above query takes about 100 seconds to run with 2 million users/profiles. I'm no DBA and our DBAs are looking at the situation to see what can be done, but since I'll likely never get to see what changes (assuming it happens at the DB level), I'm curious how you could optimized this query. It obviously needs to happen a ton faster than it is happening, like on the order of 5 seconds or less. If there is no way to optimize the SQL, is there an index or indexes you could add/change to make the query it quicker, or is there anything something else I'm overlooking?
Postgres 9 is the DB, and Django's ORM is where this query came from.
Query Plan
Limit (cost=1374.35..1383.10 rows=100 width=106)
-> Unique (cost=1374.35..1391.24 rows=193 width=106)
-> Sort (cost=1374.35..1374.83 rows=193 width=106)
Sort Key: myapp_profile.user_id, myapp_profile.name, myapp_profile.age, auth_user.username, auth_user.first_name, auth_user.last_name, auth_user.email, auth_user.password, auth_user.is_staff, auth_user.is_active, auth_user.is_superuser, auth_user.last_login, auth_user.date_joined
-> Nested Loop (cost=453.99..1367.02 rows=193 width=106)
-> Hash Left Join (cost=453.99..1302.53 rows=193 width=49)
Hash Cond: (myapp_siterel.site_id = django_site.id)
Filter: ((django_site.id IS NULL) OR (django_site.id = ANY ('{10080,10053}'::integer[])))
-> Hash Left Join (cost=448.50..1053.27 rows=15001 width=53)
Hash Cond: (myapp_profile.user_id = myapp_siterel.profile_id)
-> Seq Scan on myapp_profile (cost=0.00..286.01 rows=15001 width=49)
-> Hash (cost=261.00..261.00 rows=15000 width=8)
-> Seq Scan on myapp_siterel (cost=0.00..261.00 rows=15000 width=8)
-> Hash (cost=3.55..3.55 rows=155 width=4)
-> Seq Scan on django_site (cost=0.00..3.55 rows=155 width=4)
-> Index Scan using auth_user_pkey on auth_user (cost=0.00..0.32 rows=1 width=57)
Index Cond: (auth_user.id = myapp_profile.user_id)
Filter: ((NOT auth_user.is_superuser) AND (NOT auth_user.is_staff))
Thanks
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
我对 postgres 不太熟悉,所以我不确定它的查询优化器有多好,但看起来 where 子句中的所有内容都可以是连接条件,尽管我希望 postgres 足够聪明来工作它自己就出来了,但是如果不是,那么它将获取所有 200 万用户以及其他 3 个表中的相关记录,然后使用您的 where 进行过滤。
如果已经提到的索引尚不存在,那么它们也应该适合您。再说一遍,我更喜欢 MSSQL,但是 postgres 没有任何可以看到的统计配置文件或查询计划吗?
东西
另外,您需要独特的 吗?这也会稍微减慢速度。
I'm not so familiar with postgres, so I'm not sure how good it's query optimiser is, but it looks like everything you have in the where clause could instead be join conditions, although I'd hope postgres is clever enough to work that out for itself, however if it's not then it's going to fetch all your 2 million users with related records in the other 3 tables and then filter that using your where.
The indexes already mentioned should also work for you if they don't already exist. Again i'm more an MSSQL person but does postgres not have any statistics profile or query plan you can see?
Something along these lines
Also, do you need the distinct? That'll also slow it down somewhat.
基础知识:
确保所有用户 ID 字段都已建立索引。
看起来你也可以在 is_supervisor 和 is_staff 上建立索引,效果很好
for basics:
make sure all the user id fields are indexed.
also looks like you would do well with an index on is_supervisor, and is_staff
对于查询优化,从来没有直接的灵丹妙药解决方案,但是,明显的步骤是对您正在搜索的列进行索引,在您的情况下,即:
there's never a straight forward silver-bullet solution for query optimization, however, the obvious steps is to index columns you're searching on, in your case, that's: