MySQL count(*) 、Group BY 和 INNER JOIN
我在 MySQL 5.1 上的查询非常糟糕。 我简化了进行 JOIN 的 2 个表:
CREATE TABLE `jobs` (
`id` INT NOT NULL AUTO_INCREMENT PRIMARY KEY ,
`title` VARCHAR( 255 ) NOT NULL
) ENGINE = MYISAM ;
AND
CREATE TABLE `jobsCategories` (
`jobID` int(11) NOT NULL,
`industryID` int(11) NOT NULL,
KEY `jobID` (`jobID`),
KEY `industryID` (`industryID`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1
查询很简单:
SELECT count(*) as nb,industryID
FROM jobs J
INNER JOIN jobsCategories C ON C.jobID=J.id
GROUP BY industryID
ORDER BY nb DESC;
我在 jobs 表中添加了大约 150000 条记录,在 jobsCategories 表中添加了 350000 条记录,并且我有 30 个行业;
该查询大约需要 50 秒才能执行!
你知道为什么需要这么长时间吗?我怎样才能优化这个数据库的结构?对查询的分析表明,99% 的执行时间都花在了临时表上的复制上。
EXPLAIN <query> gives me :
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: J
type: index
possible_keys: PRIMARY
key: PRIMARY
key_len: 4
ref: NULL
rows: 178950
Extra: Using index; Using temporary; Using filesort
*************************** 2. row ***************************
id: 1
select_type: SIMPLE
table: C
type: ref
possible_keys: jobID
key: jobID
key_len: 8
ref: J.id
rows: 1
Extra: Using where
2 rows in set (0.00 sec)
关于内存:
free -m :
total used free shared buffers cached
Mem: 2011 1516 494 0 8 1075
-/+ buffers/cache: 433 1578
Swap: 5898 126 5772
下面建议的 FORCE INDEX
select count(*) as nb, industryID
from
jobs J
inner join jobsCategories C force index (industryID) on (C.jobID = J.id )
group by industryID
order by nb DESC;
SHOW PROFILE;
给了我:
+----------------------+----------+
| Status | Duration |
+----------------------+----------+
| starting | 0.000095 |
| Opening tables | 0.000014 |
| System lock | 0.000008 |
| Table lock | 0.000007 |
| init | 0.000032 |
| optimizing | 0.000011 |
| statistics | 0.000032 |
| preparing | 0.000016 |
| Creating tmp table | 0.000031 |
| executing | 0.000003 |
| Copying to tmp table | 3.301305 |
| Sorting result | 0.000028 |
| Sending data | 0.000024 |
| end | 0.000003 |
| removing tmp table | 0.000009 |
| end | 0.000004 |
| query end | 0.000003 |
| freeing items | 0.000029 |
| logging slow query | 0.000003 |
| cleaning up | 0.000003 |
+----------------------+----------+
我猜我的 RAM (2Gb) 不够大。我如何确定情况确实如此?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
首先,我认为您不需要连接表 jobs 来获得相同的结果(除非表 jobsCategories 中有一些垃圾数据):
否则您可以尝试强制对 industryID 建立索引:
Firstly I think that you don't need to join table jobs in order to get the same result (unless you have some garbage data in table jobsCategories):
Otherwise you may try to force index on industryID:
将表更改为 InnoDB =) InnoDB 可以很好地管理大表,并且 COUNT(*) 可以使其更快
http://www.mysqlperformanceblog.com/2009/01/12/should-you-move-from-myisam-to-innodb/
祝你好运
编辑:
经过测试,在没有
WHERE
子句的情况下,使用COUNT(*)
时,MyISAM 似乎比 InnoDB 更快:http://www.mysqlperformanceblog.com/2006/12/01/count-for-innodb-tables/
无论如何,我已经测试了模拟您拥有的表的确切查询( 150k Jobs 和 300k JobsCategories)使用 MyISAM 表,花了 1.5 秒,所以也许你的问题在其他地方..这就是我能告诉你的=P
change your tables to InnoDB =) InnoDB is good managing big tables and the COUNT(*) to make it faster
http://www.mysqlperformanceblog.com/2009/01/12/should-you-move-from-myisam-to-innodb/
Good Luck
EDIT:
after testing, it seems that MyISAM is faster than InnoDB when using
COUNT(*)
when there is noWHERE
clause:http://www.mysqlperformanceblog.com/2006/12/01/count-for-innodb-tables/
anyway, i've tested your exact query simulating the tables that you have (150k Jobs and 300k JobsCategories) using MyISAM tables and it took 1.5 seconds so maybe your problem is elsewhere.. it's all i can tell you =P
希望我没有误解阅读内容,但从我看来,你不需要任何加入。由于您的分组是每个相应行业有多少工作,因此所有工作都在您的工作类别表中,为什么要加入实际工作表中的工作标题,因为甚至没有返回
每条评论/反馈...
编辑 肯定会有所不同...添加与作业关联的条件...确保您的作业表在您希望允许基于...的元素上有一个索引...然后按照与最初相同的类似查询进行操作。确保您的 Jobs 表具有 CountryID 索引。
Hope I'm not misinterpreting the reading, but from what I see, you don't need ANY join. Since your grouping is how many jobs fall under each respective industry, its all in your job categories table, why join to the actual job table for the title of the job since that is not even being returned
EDIT PER COMMENT / FEEDBACK...
That definitely makes a difference... adding a criteria associated with a job... Ensure your Jobs table has an index on the element you are expecting to allow limiting based on... Then follow similar query like you originally had. Ensure your Jobs table has an index on CountryID.