如何使用mysql的EXPLAIN查找可能出现的问题
对站点进行压力测试显然,一切都在崩溃。
今天的问题:WSOD 有好几页。几个小时后,我将一个页面上的问题缩小到这个查询(我希望):它过去只需要一秒钟即可运行;现在需要> 300.
SELECT jobs.posting_date ,
jobs.id ,
jobs.title ,
addresses.street ,
cities.name ,
states.abbr ,
details.target_url ,
details.description_extracted AS extraction,
COUNT(jobs_skills.skill_id) AS skills ,
users.first_name
FROM jobs
JOIN addresses
ON addresses.id = jobs.address_id
JOIN states
ON addresses.state_id = states.id
JOIN cities
ON addresses.city_id = cities.id
JOIN job_feed_details AS details
ON jobs.id = details.job_id
LEFT JOIN jobs_skills
ON jobs.id = jobs_skills.job_id
LEFT JOIN users
ON users.id = details.user_id
WHERE details.moderated = 0
AND expiration = 0
GROUP BY jobs.id
ORDER BY jobs.posting_date DESC
运行 EXPLAIN
我得到这个:
id select_type table type possible keys key key_len ref rows extra
1 SIMPLE details ALL job_id 537704 Using where; Using temporary; Using filesort
1 SIMPLE jobs eq_ref PRIMARY,address_id_indexPRIMARY 4 557574_dev.details.job_id 1 Using where
1 SIMPLE addresses eq_ref PRIMARY PRIMARY 4 557574_dev.jobs.address_id 1 Using where
1 SIMPLE states eq_ref PRIMARY PRIMARY 1 557574_dev.addresses.state_id 1 Using where
1 SIMPLE cities eq_ref PRIMARY PRIMARY 4 557574_dev.addresses.city_id 1
1 SIMPLE jobs_skills ref Job_skill Job_skill 4 557574_dev.jobs.id 4 Using index
1 SIMPLE users eq_ref PRIMARY PRIMARY 3 557574_dev.details.user_id 1
查看 EXPLAIN
是否可以判断
- 是否发生全表扫描
- 是否缺少任何相关切口
- 哪个表或连接是太慢了
- 我的“寻找慢表的任务”中的任何其他有用信息
更新:在没有 group_by (和相关的表连接)的情况下再次运行查询;仍然需要临时表和文件排序,所以这似乎是一个索引问题。将开始查看所有表以查找缺失的索引。
Stress testing a site & everything is breaking, obviously.
Today's problem: WSOD on several pages. After a few hours I have narrowed the problem on one page down to this query (I hope): It used to run in a second; now it takes > 300.
SELECT jobs.posting_date ,
jobs.id ,
jobs.title ,
addresses.street ,
cities.name ,
states.abbr ,
details.target_url ,
details.description_extracted AS extraction,
COUNT(jobs_skills.skill_id) AS skills ,
users.first_name
FROM jobs
JOIN addresses
ON addresses.id = jobs.address_id
JOIN states
ON addresses.state_id = states.id
JOIN cities
ON addresses.city_id = cities.id
JOIN job_feed_details AS details
ON jobs.id = details.job_id
LEFT JOIN jobs_skills
ON jobs.id = jobs_skills.job_id
LEFT JOIN users
ON users.id = details.user_id
WHERE details.moderated = 0
AND expiration = 0
GROUP BY jobs.id
ORDER BY jobs.posting_date DESC
Running EXPLAIN
I get this:
id select_type table type possible keys key key_len ref rows extra
1 SIMPLE details ALL job_id 537704 Using where; Using temporary; Using filesort
1 SIMPLE jobs eq_ref PRIMARY,address_id_indexPRIMARY 4 557574_dev.details.job_id 1 Using where
1 SIMPLE addresses eq_ref PRIMARY PRIMARY 4 557574_dev.jobs.address_id 1 Using where
1 SIMPLE states eq_ref PRIMARY PRIMARY 1 557574_dev.addresses.state_id 1 Using where
1 SIMPLE cities eq_ref PRIMARY PRIMARY 4 557574_dev.addresses.city_id 1
1 SIMPLE jobs_skills ref Job_skill Job_skill 4 557574_dev.jobs.id 4 Using index
1 SIMPLE users eq_ref PRIMARY PRIMARY 3 557574_dev.details.user_id 1
looking at the EXPLAIN
is it possible to tell
- If there are any full table scans happening
- If any relevant incises are missing
- Which table or join is being so slow
- Any other useful information in my 'quest to find the slow table'
Update: Running the query again without the group_by (and related table joins); still is requiring a temp table and filesort, so it seems it is an index issue. Will begin looking at all the tables for missing indices.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
你定义了哪些指数?
如果您索引 jobs.address_id、addresses.state_id、addresses.city_id、details.job_id、jobs_skills.job_id、details.user_id 和 jobs.posting_date,您应该能够从索引执行整个连接,而无需访问基础表并运行使用索引排序。
另外,职位是否按照发布日期顺序插入?如果是这样,您可以按 id 排序,而不是按 posts_date 排序,这会更快,因为它是主键。
解释计划看起来大部分处理都是在分组和排序中。您在最后一步中得到了文件排序和临时表,这是相当昂贵的。此外,看起来您在可能应该使用索引的地方使用了 where ,因此您可能需要确保所有关联列都已建立索引。
我建议在沙箱中加载数据并使用索引组合,直到您的解释计划使用更多索引并且希望没有临时表或文件排序。尽管分组往往很昂贵,但您可能会在最后一部分遇到一些困难。
这有帮助吗?
what indices do you have defined?
If you index jobs.address_id, addresses.state_id, addresses.city_id, details.job_id, jobs_skills.job_id, details.user_id, and jobs.posting_date you should be able to do the entire join from indices without hitting the underlying table and run the ordering with an index.
Also, are jobs inserted in posting_date order? If so, you can order by id instead of by posting_date, which will be faster since it's a primary key.
The explain plan looks like the majority of the processing is in the grouping and ordering. You've got a filesort and temporary table in the final step, which is pretty expensive. In addition, it looks like you're using where in places where you probably should be using index, so you might want to make sure all the association columns are indexed.
I'd recommend loading up the data in your sandbox and playing with index combinations until your explain plan uses more indices and hopefully no temporary tables or filesorting. You might have some difficulty with that last part though as grouping tends to be expensive.
Does that help?
如果我错了,请原谅我,但似乎在生成的解释计划中的“额外”列下,它为您指定了是否对指定的表和使用的键使用索引,例如表:地址和关键作业.address_id,不使用索引。
因此,您所需要做的就是记下在额外列下看到“位置”的列。
对于这样的表可以考虑建立索引。
在最大的表上添加索引显然会对性能产生最大的影响,我相信您应该从那里开始。
Forgive me if I am wrong, but it seems like under the column 'extra' in the generated explain plan it is specified for you if an index will be used or not for the specified table and used key , e.g. table: addresses and key jobs.address_id, no index is used.
So all you need to do is note the columns of where you see 'where' under extra column.
For such a table you can consider making an index.
Adding an index on the largest table will obviously have the greatest effect on performance, and I believe you should start there.