当前位置：文江博客话题详情

为什么从我的SQL查询中删除二进制函数调用会如此大大更改查询计划？

发布于 2025-01-22 12:44:09 字数 3464 浏览 5 评论 0 原文

我有一个SQL查询，该查询在表中查找特定值，然后在三个表中进行内部连接以获取结果集。这三个表是 Fabric_barcode_oc ， Fabric_barcode_items ＆amp; Fabric_barcode_rolls

初始查询

在运行 divelly Analyaze 对此时，查询的初始版本在下面

EXPLAIN ANALYZE
SELECT `oc`.`oc_number` AS `ocNumber` , `roll`.`po_number` AS `poNumber` ,
`item`.`item_code` AS `itemCode` , `roll`.`roll_length` AS `rollLength` ,
`roll`.`roll_utilized` AS `rollUtilized`
FROM `fabric_barcode_rolls` AS `roll`
INNER JOIN `fabric_barcode_oc` AS `oc` ON `oc`.`oc_unique_id` = `roll`.`oc_unique_id`
INNER JOIN `fabric_barcode_items` AS `item` ON `item`.`item_unique_id` = `roll`.`item_unique_id_fk`
WHERE BINARY `roll`.`roll_number` = 'dZkzHJ_je8'

，我会得到以下

"-> Nested loop inner join  (cost=468160.85 rows=582047) (actual time=0.063..254.186 rows=1 loops=1)
    -> Nested loop inner join  (cost=264444.40 rows=582047) (actual time=0.057..254.179 rows=1 loops=1)
        -> Filter: (cast(roll.roll_number as char charset binary) = 'dZkzHJ_je8')  (cost=60727.95 rows=582047) (actual time=0.047..254.169 rows=1 loops=1)
            -> Table scan on roll  (cost=60727.95 rows=582047) (actual time=0.042..198.634 rows=599578 loops=1)
        -> Single-row index lookup on oc using PRIMARY (oc_unique_id=roll.oc_unique_id)  (cost=0.25 rows=1) (actual time=0.009..0.009 rows=1 loops=1)
    -> Single-row index lookup on item using PRIMARY (item_unique_id=roll.item_unique_id_fk)  (cost=0.25 rows=1) (actual time=0.006..0.006 rows=1 loops=1)
"

更新的查询

我将查询更改为

EXPLAIN ANALYZE
SELECT `oc`.`oc_number` AS `ocNumber` , `roll`.`po_number` AS `poNumber` ,
`item`.`item_code` AS `itemCode` , `roll`.`roll_length` AS `rollLength` ,
`roll`.`roll_utilized` AS `rollUtilized`
FROM `fabric_barcode_rolls` AS `roll`
INNER JOIN `fabric_barcode_oc` AS `oc` ON `oc`.`oc_unique_id` = `roll`.`oc_unique_id`
INNER JOIN `fabric_barcode_items` AS `item` ON `item`.`item_unique_id` = `roll`.`item_unique_id_fk`
WHERE `roll`.`roll_number` = 'dZkzHJ_je8'

，这将生成以下执行计划，

"-> Rows fetched before execution  (cost=0.00 rows=1) (actual time=0.000..0.000 rows=1 loops=1)

两个查询之间的唯一区别是我从查询中删除了 binary 函数调用。我为何计划如此不同而感到困惑？

执行时间

查询1的执行时间约为375ms，而第二个查询的执行时间约为160ms。

是什么造成这种差异？

根据

要求

fabric_barcode_rolls,"CREATE TABLE `fabric_barcode_rolls` (
  `roll_unique_id` int NOT NULL AUTO_INCREMENT,
  `oc_unique_id` int NOT NULL,
  `item_unique_id_fk` int NOT NULL,
  `roll_number` char(30) NOT NULL,
  `roll_length` decimal(10,2) DEFAULT '0.00',
  `po_number` char(22) DEFAULT NULL,
  `roll_utilized` decimal(10,2) DEFAULT '0.00',
  `user` char(30) NOT NULL,
  `mir_number` char(22) DEFAULT NULL,
  `mir_location` char(10) DEFAULT NULL,
  `mir_stamp` datetime DEFAULT NULL,
  `creation_stamp` datetime DEFAULT CURRENT_TIMESTAMP,
  `update_stamp` datetime DEFAULT NULL ON UPDATE CURRENT_TIMESTAMP,
  PRIMARY KEY (`roll_unique_id`),
  UNIQUE KEY `roll_number` (`roll_number`),
  KEY `fabric_barcode_item_fk` (`item_unique_id_fk`),
  CONSTRAINT `fabric_barcode_item_fk` FOREIGN KEY (`item_unique_id_fk`) REFERENCES `fabric_barcode_items` (`item_unique_id`) ON DELETE CASCADE
) ENGINE=InnoDB AUTO_INCREMENT=610684 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci"

原文

I have a SQL query that looks for a specific value in a table and then does inner joins across three tables to fetch the result set. The three tables are fabric_barcode_oc, fabric_barcode_items & fabric_barcode_rolls

Initial Query

The initial version of the query is below

EXPLAIN ANALYZE
SELECT `oc`.`oc_number` AS `ocNumber` , `roll`.`po_number` AS `poNumber` ,
`item`.`item_code` AS `itemCode` , `roll`.`roll_length` AS `rollLength` ,
`roll`.`roll_utilized` AS `rollUtilized`
FROM `fabric_barcode_rolls` AS `roll`
INNER JOIN `fabric_barcode_oc` AS `oc` ON `oc`.`oc_unique_id` = `roll`.`oc_unique_id`
INNER JOIN `fabric_barcode_items` AS `item` ON `item`.`item_unique_id` = `roll`.`item_unique_id_fk`
WHERE BINARY `roll`.`roll_number` = 'dZkzHJ_je8'

When running EXPLAIN ANALYZE on this, I get the following

"-> Nested loop inner join  (cost=468160.85 rows=582047) (actual time=0.063..254.186 rows=1 loops=1)
    -> Nested loop inner join  (cost=264444.40 rows=582047) (actual time=0.057..254.179 rows=1 loops=1)
        -> Filter: (cast(roll.roll_number as char charset binary) = 'dZkzHJ_je8')  (cost=60727.95 rows=582047) (actual time=0.047..254.169 rows=1 loops=1)
            -> Table scan on roll  (cost=60727.95 rows=582047) (actual time=0.042..198.634 rows=599578 loops=1)
        -> Single-row index lookup on oc using PRIMARY (oc_unique_id=roll.oc_unique_id)  (cost=0.25 rows=1) (actual time=0.009..0.009 rows=1 loops=1)
    -> Single-row index lookup on item using PRIMARY (item_unique_id=roll.item_unique_id_fk)  (cost=0.25 rows=1) (actual time=0.006..0.006 rows=1 loops=1)
"

Updated Query

I then changed the query to

EXPLAIN ANALYZE
SELECT `oc`.`oc_number` AS `ocNumber` , `roll`.`po_number` AS `poNumber` ,
`item`.`item_code` AS `itemCode` , `roll`.`roll_length` AS `rollLength` ,
`roll`.`roll_utilized` AS `rollUtilized`
FROM `fabric_barcode_rolls` AS `roll`
INNER JOIN `fabric_barcode_oc` AS `oc` ON `oc`.`oc_unique_id` = `roll`.`oc_unique_id`
INNER JOIN `fabric_barcode_items` AS `item` ON `item`.`item_unique_id` = `roll`.`item_unique_id_fk`
WHERE `roll`.`roll_number` = 'dZkzHJ_je8'

and this generates the following execution plan

"-> Rows fetched before execution  (cost=0.00 rows=1) (actual time=0.000..0.000 rows=1 loops=1)

The only difference between the two queries is that I removed the BINARY function call from the query. I'm confused by why the plan is so different?

Execution Times

Query 1 had an execution time of ~375ms while the second query had an execution time of ~160ms.

What is causing this difference?

UPDATE

Including the table schema definition for fabric_barcode_rolls as requested

fabric_barcode_rolls,"CREATE TABLE `fabric_barcode_rolls` (
  `roll_unique_id` int NOT NULL AUTO_INCREMENT,
  `oc_unique_id` int NOT NULL,
  `item_unique_id_fk` int NOT NULL,
  `roll_number` char(30) NOT NULL,
  `roll_length` decimal(10,2) DEFAULT '0.00',
  `po_number` char(22) DEFAULT NULL,
  `roll_utilized` decimal(10,2) DEFAULT '0.00',
  `user` char(30) NOT NULL,
  `mir_number` char(22) DEFAULT NULL,
  `mir_location` char(10) DEFAULT NULL,
  `mir_stamp` datetime DEFAULT NULL,
  `creation_stamp` datetime DEFAULT CURRENT_TIMESTAMP,
  `update_stamp` datetime DEFAULT NULL ON UPDATE CURRENT_TIMESTAMP,
  PRIMARY KEY (`roll_unique_id`),
  UNIQUE KEY `roll_number` (`roll_number`),
  KEY `fabric_barcode_item_fk` (`item_unique_id_fk`),
  CONSTRAINT `fabric_barcode_item_fk` FOREIGN KEY (`item_unique_id_fk`) REFERENCES `fabric_barcode_items` (`item_unique_id`) ON DELETE CASCADE
) ENGINE=InnoDB AUTO_INCREMENT=610684 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci"

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

无可置疑 2025-01-29 12:44:09

您的性能差异是由于这一事实：在MySQL中，collations collations collations collations collations和char（）列被烘烤到索引中。

编辑更新以匹配表定义。

您的 Fabric_barcode_rolls 表具有这样的列：

roll_number char(30) NOT NULL,
...
UNIQUE KEY roll_number (roll_number).

因此，您的 where ... binary roll.roll_number ='dzkzhj_je8' filter strause是 not sargable ：它无法在该列上使用索引。但是 where ... roll.roll_number ='dzkzhj_je8'是可靠的：它确实使用了索引。所以很快。但是该列的默认整理是不敏感的。因此，这是快与错。

可以解决。

请注意，该列上没有整理声明。这意味着它使用表的默认值： utf8mb4_0900_ai_ai_ci ，一种对案例不敏感的整理。

您想要的普通条形码列是一个单字节的charset和一个对案例敏感的整理。这将改变您的桌子来做到这一点。

 ALTER TABLE fabric_barcode_rolls
CHANGE  roll_number 
        roll_number CHAR(30) COLLATE latin1_bin NOT NULL;

这是一个多级胜利。使用适合条形码的正确字符集可以节省数据。它使索引更短，更有效。它可以进行病例敏感的（二进制匹配）查找，它们本身使索引短并且更有效地使用了索引。而且它不会在带有上部和下情况特征集的条形码之间运行碰撞风险。

在您得出结论认为碰撞风险是如此之低之前，您不必担心它，请阅读有关生日悖论的信息。

Your performance difference is due to this fact: in MySQL, collations on VARCHAR() and CHAR() columns are baked into the indexes.

Edit updated to match the table definition.

Your fabric_barcode_rolls table has a column defined like this:

roll_number char(30) NOT NULL,
...
UNIQUE KEY roll_number (roll_number).

So, your WHERE ... BINARY roll.roll_number = 'dZkzHJ_je8' filter clause is not sargable: it can't use the index on that column. But WHERE ... roll.roll_number = 'dZkzHJ_je8' is sargable: it does use the index. So it's fast. But the column's default collation is case-insensitive. So, it's fast and wrong.

That can be fixed.

Notice there's no collation declaration on the column. That means it's using the table's default: utf8mb4_0900_ai_ci, a case-insensitive collation.

What you want for an ordinary barcode column is a one-byte-per-character charset and a case-sensitive collation. This would change your table to do that.

 ALTER TABLE fabric_barcode_rolls
CHANGE  roll_number 
        roll_number CHAR(30) COLLATE latin1_bin NOT NULL;

This is a multilevel win. Using the correct character set for your barcodes saves data. It makes the indexes shorter and more efficient to use. It does case-sensitive (binary-match) lookups, which themselves make indexes shorter and much more efficient to use. And it doesn't run the collision risk between barcodes with upper and lower case character sets.

Before you conclude that the collision risk is so low you don't have to worry about it, please read about the birthday paradox.

回复收藏 0 原文

~没有更多了~