mysql 查询速度慢/可能的查询问题

发布于 2024-11-07 07:32:43 字数 3134 浏览 0 评论 0原文

我有 2 个(类似的)查询:

-- Query #1 - get all new products not in currently in the Product table
-- Should match any products in the temp table that do not exist in the Product table
INSERT
  INTO `tmpProductState` (`ProductId`, `ChangedOn`, `State`)
SELECT t.`ProductId`, t.`ProcessedOn`, \'Activated\'
  FROM `tmpImport` t
  LEFT JOIN `Product` p USING (`ProductId`)
 WHERE p.`ProductId` IS NULL
    ON DUPLICATE KEY UPDATE
       `State` = VALUES(`State`)

-- Query #2 - get all Products that are removed from the Product table
-- Should match any products in the Product table that do not exist in the temp table
INSERT
  INTO `tmpProductState` (`ProductId`, `ChangedOn`, `State`)
SELECT p.`ProductId`, p.`LastSeenDate`, \'Deactivated\'
  FROM `Product` p
  LEFT JOIN `tmpImport` t USING (`ProductId`)
 WHERE t.`ProductId` IS NULL
    ON DUPLICATE KEY UPDATE
       `State` = VALUES(`State`)

在第一次运行(第 1 天)时,当 Product 表为空时,两个查询都在 1 秒内运行,但是,在第二次运行时(第 2 天),当 Product 表有 14000 条记录时,第一个查询运行时间不到 2 秒,第二个查询运行时间为 244 秒。每次连续的数据导入都是相同的(查询 #2 为 240-250 秒。在检查数据库时,所有数据似乎都是正确的,我只是不明白为什么第二个查询花了这么长时间。

---> 编辑:慢查询日志:

# Query_time: 245.328784  Lock_time: 0.000000 Rows_sent: 0  Rows_examined: 187711973
SET timestamp=1305151558;

INSERT
  INTO `tmpProductState` (`ProductId`, `ChangedOn`, `State`)
SELECT p.`ProductId`, p.`LastSeenDate`, 'Deactivated'
  FROM `Product` p
  LEFT JOIN `tmpImport` t USING (`ProductId`)
 WHERE t.`ProductId` IS NULL
    ON DUPLICATE KEY UPDATE
       `State` = VALUES(`State`);

此时我最关心的是: Rows_examined: 187711973 (它到底是如何检查这么多行的?) Product 表大小约为 14000 条记录,导入表大小为~28000 条记录,tmpProductState 最多为 60 条记录

---> 另一个编辑:

EXPLAIN SELECT p.`ProductId` , p.`LastSeenDate` , 'Deactivated'
FROM `Product` p
LEFT JOIN `tmpImport` t
USING ( `ProductId` )
WHERE t.`ProductId` IS NULL 

id  select_type     table   type    possible_keys   key     key_len     ref     rows    Extra
1   SIMPLE          p       ALL     NULL            NULL    NULL            NULL    14151
1   SIMPLE          t       index   NULL            PRIMARY 100             NULL    28166   Using where; Using index; Not exists

涉及的表:

CREATE TABLE IF NOT EXISTS `tmpImport` (
  `CategoryId`             smallint(5) unsigned NOT NULL,
  `ProcessedOn`            date DEFAULT NULL,
  `ProductId`              varchar(32) NOT NULL,
  `Title`                  varchar(255) DEFAULT NULL,
  `Description`            text,
  `ActivateDate`           date DEFAULT NULL,
  PRIMARY KEY (`CategoryId`,`ProductId`)
) ENGINE=MyISAM DEFAULT CHARSET = UTF8

CREATE TABLE IF NOT EXISTS `tmpProductState` (
  `ProductId` VARCHAR(32) NOT NULL,
  `ChangedOn` DATE NOT NULL,
  `State` ENUM(\'Activated\',\'Deactivated\'),
  PRIMARY KEY(`ProductId`,`ChangedOn`)
) ENGINE = Memory

CREATE TABLE `Product` (
  `ProductId` varchar(32) NOT NULL,
  `Title` varchar(255) DEFAULT NULL,
  `Description` text,
  `ActivateDate` date DEFAULT NULL,
  `LastSeenDate` date DEFAULT NULL,
  PRIMARY KEY (`ProductId`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8

I have 2 (similar) queries:

-- Query #1 - get all new products not in currently in the Product table
-- Should match any products in the temp table that do not exist in the Product table
INSERT
  INTO `tmpProductState` (`ProductId`, `ChangedOn`, `State`)
SELECT t.`ProductId`, t.`ProcessedOn`, \'Activated\'
  FROM `tmpImport` t
  LEFT JOIN `Product` p USING (`ProductId`)
 WHERE p.`ProductId` IS NULL
    ON DUPLICATE KEY UPDATE
       `State` = VALUES(`State`)

-- Query #2 - get all Products that are removed from the Product table
-- Should match any products in the Product table that do not exist in the temp table
INSERT
  INTO `tmpProductState` (`ProductId`, `ChangedOn`, `State`)
SELECT p.`ProductId`, p.`LastSeenDate`, \'Deactivated\'
  FROM `Product` p
  LEFT JOIN `tmpImport` t USING (`ProductId`)
 WHERE t.`ProductId` IS NULL
    ON DUPLICATE KEY UPDATE
       `State` = VALUES(`State`)

On first run, (day 1), when the Product table is empty, both queries run in under 1 second, however, on second run, (day 2), when the Product table has 14000 records, the first query runs in under 2 seconds, and the second query runs in 244 seconds. Each consecutive data import is the same (240-250 seconds for query # 2. On checking the database, all data seems correct, I just can't figure out why the second query is taking so long.

---> Edit: Slow Query log:

# Query_time: 245.328784  Lock_time: 0.000000 Rows_sent: 0  Rows_examined: 187711973
SET timestamp=1305151558;

INSERT
  INTO `tmpProductState` (`ProductId`, `ChangedOn`, `State`)
SELECT p.`ProductId`, p.`LastSeenDate`, 'Deactivated'
  FROM `Product` p
  LEFT JOIN `tmpImport` t USING (`ProductId`)
 WHERE t.`ProductId` IS NULL
    ON DUPLICATE KEY UPDATE
       `State` = VALUES(`State`);

What concerns me most at this point: Rows_examined: 187711973 (how exactly is it examining that many rows?) The Product table size is ~14000 records, and the import table size is ~28000 records, and the tmpProductState is at most 60 records.

---> Another Edit:

EXPLAIN SELECT p.`ProductId` , p.`LastSeenDate` , 'Deactivated'
FROM `Product` p
LEFT JOIN `tmpImport` t
USING ( `ProductId` )
WHERE t.`ProductId` IS NULL 

id  select_type     table   type    possible_keys   key     key_len     ref     rows    Extra
1   SIMPLE          p       ALL     NULL            NULL    NULL            NULL    14151
1   SIMPLE          t       index   NULL            PRIMARY 100             NULL    28166   Using where; Using index; Not exists

Tables Involved:

CREATE TABLE IF NOT EXISTS `tmpImport` (
  `CategoryId`             smallint(5) unsigned NOT NULL,
  `ProcessedOn`            date DEFAULT NULL,
  `ProductId`              varchar(32) NOT NULL,
  `Title`                  varchar(255) DEFAULT NULL,
  `Description`            text,
  `ActivateDate`           date DEFAULT NULL,
  PRIMARY KEY (`CategoryId`,`ProductId`)
) ENGINE=MyISAM DEFAULT CHARSET = UTF8

CREATE TABLE IF NOT EXISTS `tmpProductState` (
  `ProductId` VARCHAR(32) NOT NULL,
  `ChangedOn` DATE NOT NULL,
  `State` ENUM(\'Activated\',\'Deactivated\'),
  PRIMARY KEY(`ProductId`,`ChangedOn`)
) ENGINE = Memory

CREATE TABLE `Product` (
  `ProductId` varchar(32) NOT NULL,
  `Title` varchar(255) DEFAULT NULL,
  `Description` text,
  `ActivateDate` date DEFAULT NULL,
  `LastSeenDate` date DEFAULT NULL,
  PRIMARY KEY (`ProductId`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

冰雪梦之恋 2024-11-14 07:32:43

您的表没有标准化,您没有有效的索引,并且您的连接......不寻常。

我假设您无法对表之间的数据总重复做任何事情,所以让我们忽略这一点。

看起来您正在连接的表中的列之间复制数据,您应该在连接中使用所有这些列。所以它可能应该是:

LEFT JOIN `tmpImport` t USING (`ProductId`, `Title`, `Description`, `ActivateDate`)

将索引添加到与您要连接或过滤的字段相对应的表中。不要将复合键作为主键。相反,添加一个自动递增字段作为 PK,并在需要强制唯一性时使用唯一的键。产品表和 tmpImport 都应该有连接的每一列的键。

希望其中一些想法对您有所帮助。

Your tables are not normalized, you don't have effective indexes, and your joins are... unusual.

I assume you can't do anything about the gross duplication of data across tables, so lets ignore that.

Looks like you are duplicating data across columns in the tables you are joining, you should use all of those columns in your join. So it should probably be:

LEFT JOIN `tmpImport` t USING (`ProductId`, `Title`, `Description`, `ActivateDate`)

Add indexes to your tables that correspond to the fields you are joining or filtering on. DO NOT have a compound key as your primary key. Instead add an auto incrementing field to be the PK, and have a unique key if you need to enforce uniqueness. Both the product table and tmpImport should have keys for each column being joined.

Hope some of these ideas help you out.

冷心人i 2024-11-14 07:32:43

很晚才回复此问题,但您的第一个查询是从 tmpImport 获取所有记录,并且他们使用 Product 表上的主键从 Product 获取所有记录。这非常有效。第二个查询是从 Product 获取所有记录,然后从 tmpImport 获取匹配记录,但没有使用 ProductId 上的任何索引。 em>tmpImport。因此运行得很糟糕。

ProductId 上的索引添加到 tmpImport 表中(此连接将忽略主键中的 ProductId,因为它不是表中的第一列密钥,并且您没有使用第一个 CategoryId )。

Very late reply to this, but your first query is getting all the records from tmpImport and they getting all the records from Product using the primary key on the Product table. This is pretty efficient. The second query is getting all the records from Product and then getting the matching records from tmpImport but without the benefit of any index on ProductId on tmpImport. Hence runs badly.

Add an index on ProductId to the tmpImport table (the ProductId in the primary key is ignored for this join as it is not the first column in the key and you are not using CategoryId which is the first ).

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文