MySQL count(*) 、Group BY 和 INNER JOIN

发布于 2024-11-19 23:27:21 字数 2811 浏览 4 评论 0 原文

我在 MySQL 5.1 上的查询非常糟糕。 我简化了进行 JOIN 的 2 个表:

CREATE TABLE  `jobs` (
`id` INT NOT NULL AUTO_INCREMENT PRIMARY KEY ,
`title` VARCHAR( 255 ) NOT NULL
) ENGINE = MYISAM ;

AND

CREATE TABLE `jobsCategories` (
 `jobID` int(11) NOT NULL,
 `industryID` int(11) NOT NULL,
 KEY `jobID` (`jobID`),
 KEY `industryID` (`industryID`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1

查询很简单:

SELECT count(*) as nb,industryID 
FROM  jobs J 
INNER JOIN jobsCategories C ON C.jobID=J.id 
GROUP BY industryID 
ORDER BY nb DESC;

我在 jobs 表中添加了大约 150000 条记录,在 jobsCategories 表中添加了 350000 条记录,并且我有 30 个行业;

该查询大约需要 50 秒才能执行!

你知道为什么需要这么长时间吗?我怎样才能优化这个数据库的结构?对查询的分析表明,99% 的执行时间都花在了临时表上的复制上。

EXPLAIN <query> gives me : 


*************************** 1. row ***************************
       id: 1
select_type: SIMPLE
    table: J
     type: index
possible_keys: PRIMARY
      key: PRIMARY
  key_len: 4
      ref: NULL
     rows: 178950
    Extra: Using index; Using temporary; Using filesort
*************************** 2. row ***************************
       id: 1
 select_type: SIMPLE
    table: C
     type: ref
possible_keys: jobID
      key: jobID
  key_len: 8
      ref: J.id
     rows: 1
    Extra: Using where
2 rows in set (0.00 sec)

关于内存:

free -m  : 

total       used       free     shared    buffers     cached
Mem:          2011       1516        494          0          8       1075
-/+ buffers/cache:        433       1578
Swap:         5898        126       5772

下面建议的 FORCE INDEX

select count(*) as nb, industryID 
from 
    jobs J 
    inner join jobsCategories C force index (industryID) on (C.jobID = J.id )
group by industryID 
order by nb DESC;

SHOW PROFILE;

给了我:

+----------------------+----------+
| Status               | Duration |
+----------------------+----------+
| starting             | 0.000095 |
| Opening tables       | 0.000014 |
| System lock          | 0.000008 |
| Table lock           | 0.000007 |
| init                 | 0.000032 |
| optimizing           | 0.000011 |
| statistics           | 0.000032 |
| preparing            | 0.000016 |
| Creating tmp table   | 0.000031 |
| executing            | 0.000003 |
| Copying to tmp table | 3.301305 |
| Sorting result       | 0.000028 |
| Sending data         | 0.000024 |
| end                  | 0.000003 |
| removing tmp table   | 0.000009 |
| end                  | 0.000004 |
| query end            | 0.000003 |
| freeing items        | 0.000029 |
| logging slow query   | 0.000003 |
| cleaning up          | 0.000003 |
+----------------------+----------+

我猜我的 RAM (2Gb) 不够大。我如何确定情况确实如此?

I have a really bad time with a query on MySQL 5.1.
I simplified the 2 tables I make a JOIN on :

CREATE TABLE  `jobs` (
`id` INT NOT NULL AUTO_INCREMENT PRIMARY KEY ,
`title` VARCHAR( 255 ) NOT NULL
) ENGINE = MYISAM ;

AND

CREATE TABLE `jobsCategories` (
 `jobID` int(11) NOT NULL,
 `industryID` int(11) NOT NULL,
 KEY `jobID` (`jobID`),
 KEY `industryID` (`industryID`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1

The query is straight forward :

SELECT count(*) as nb,industryID 
FROM  jobs J 
INNER JOIN jobsCategories C ON C.jobID=J.id 
GROUP BY industryID 
ORDER BY nb DESC;

I got around 150000 records into the jobs table, and 350000 records into the jobsCategories table, and I have 30 industries;

The query takes approximatively 50 seconds to execute !!!

Do you have any idea why it takes so long? How could I optimize the structure of this database? Profilling the query show me that 99% of the execution time is spend on copying on tmp tables.

EXPLAIN <query> gives me : 


*************************** 1. row ***************************
       id: 1
select_type: SIMPLE
    table: J
     type: index
possible_keys: PRIMARY
      key: PRIMARY
  key_len: 4
      ref: NULL
     rows: 178950
    Extra: Using index; Using temporary; Using filesort
*************************** 2. row ***************************
       id: 1
 select_type: SIMPLE
    table: C
     type: ref
possible_keys: jobID
      key: jobID
  key_len: 8
      ref: J.id
     rows: 1
    Extra: Using where
2 rows in set (0.00 sec)

About the memory :

free -m  : 

total       used       free     shared    buffers     cached
Mem:          2011       1516        494          0          8       1075
-/+ buffers/cache:        433       1578
Swap:         5898        126       5772

With the FORCE INDEX suggested below

select count(*) as nb, industryID 
from 
    jobs J 
    inner join jobsCategories C force index (industryID) on (C.jobID = J.id )
group by industryID 
order by nb DESC;

SHOW PROFILE;

gives me :

+----------------------+----------+
| Status               | Duration |
+----------------------+----------+
| starting             | 0.000095 |
| Opening tables       | 0.000014 |
| System lock          | 0.000008 |
| Table lock           | 0.000007 |
| init                 | 0.000032 |
| optimizing           | 0.000011 |
| statistics           | 0.000032 |
| preparing            | 0.000016 |
| Creating tmp table   | 0.000031 |
| executing            | 0.000003 |
| Copying to tmp table | 3.301305 |
| Sorting result       | 0.000028 |
| Sending data         | 0.000024 |
| end                  | 0.000003 |
| removing tmp table   | 0.000009 |
| end                  | 0.000004 |
| query end            | 0.000003 |
| freeing items        | 0.000029 |
| logging slow query   | 0.000003 |
| cleaning up          | 0.000003 |
+----------------------+----------+

I guess my RAM (2Gb) is not large enough. How can I be certain this is the case?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

黑色毁心梦 2024-11-26 23:27:21

首先,我认为您不需要连接表 jobs 来获得相同的结果(除非表 jobsCategories 中有一些垃圾数据):

select count(*) as nb, industryID 
from jobsCategories
group by industryID 
order by nb DESC;

否则您可以尝试强制对 industryID 建立索引:

select count(*) as nb, industryID 
from 
    jobs J 
    inner join jobsCategories C force index (industryID) on (C.jobID = J.id )
group by industryID 
order by nb DESC;

Firstly I think that you don't need to join table jobs in order to get the same result (unless you have some garbage data in table jobsCategories):

select count(*) as nb, industryID 
from jobsCategories
group by industryID 
order by nb DESC;

Otherwise you may try to force index on industryID:

select count(*) as nb, industryID 
from 
    jobs J 
    inner join jobsCategories C force index (industryID) on (C.jobID = J.id )
group by industryID 
order by nb DESC;
终陌 2024-11-26 23:27:21

将表更改为 InnoDB =) InnoDB 可以很好地管理大表,并且 COUNT(*) 可以使其更快

http://www.mysqlperformanceblog.com/2009/01/12/should-you-move-from-myisam-to-innodb/

祝你好运

编辑:
经过测试,在没有 WHERE 子句的情况下,使用 COUNT(*) 时,MyISAM 似乎比 InnoDB 更快:

http://www.mysqlperformanceblog.com/2006/12/01/count-for-innodb-tables/

无论如何,我已经测试了模拟您拥有的表的确切查询( 150k Jobs 和 300k JobsCategories)使用 MyISAM 表,花了 1.5 秒,所以也许你的问题在其他地方..这就是我能告诉你的=P

change your tables to InnoDB =) InnoDB is good managing big tables and the COUNT(*) to make it faster

http://www.mysqlperformanceblog.com/2009/01/12/should-you-move-from-myisam-to-innodb/

Good Luck

EDIT:
after testing, it seems that MyISAM is faster than InnoDB when using COUNT(*) when there is no WHERE clause:

http://www.mysqlperformanceblog.com/2006/12/01/count-for-innodb-tables/

anyway, i've tested your exact query simulating the tables that you have (150k Jobs and 300k JobsCategories) using MyISAM tables and it took 1.5 seconds so maybe your problem is elsewhere.. it's all i can tell you =P

不甘平庸 2024-11-26 23:27:21

希望我没有误解阅读内容,但从我看来,你不需要任何加入。由于您的分组是每个相应行业有多少工作,因此所有工作都在您的工作类别表中,为什么要加入实际工作表中的工作标题,因为甚至没有返回

select IndustryID,
       count(*) JobsPerIndustry
   from JobCategories
   group by IndustryID

每条评论/反馈...

编辑 肯定会有所不同...添加与作业关联的条件...确保您的作业表在您希望允许基于...的元素上有一个索引...然后按照与最初相同的类似查询进行操作。确保您的 Jobs 表具有 CountryID 索引。

SELECT
      count(*) as nb,
      industryID 
   FROM  jobs J 
      JOIN jobsCategories C 
         ON J.ID = C.jobID
   WHERE 
      J.countryID=1234
   GROUP BY 
      industryID 
   ORDER BY 
      nb DESC;

Hope I'm not misinterpreting the reading, but from what I see, you don't need ANY join. Since your grouping is how many jobs fall under each respective industry, its all in your job categories table, why join to the actual job table for the title of the job since that is not even being returned

select IndustryID,
       count(*) JobsPerIndustry
   from JobCategories
   group by IndustryID

EDIT PER COMMENT / FEEDBACK...

That definitely makes a difference... adding a criteria associated with a job... Ensure your Jobs table has an index on the element you are expecting to allow limiting based on... Then follow similar query like you originally had. Ensure your Jobs table has an index on CountryID.

SELECT
      count(*) as nb,
      industryID 
   FROM  jobs J 
      JOIN jobsCategories C 
         ON J.ID = C.jobID
   WHERE 
      J.countryID=1234
   GROUP BY 
      industryID 
   ORDER BY 
      nb DESC;
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文