如何使用这个基本的内连接避免全表扫描?

发布于 2024-08-16 02:46:49 字数 1541 浏览 4 评论 0原文

我有一个表,该表具有指向存储一些 blob 数据的表的外键。当我对主表上的条件进行内部联接时,联接类型从“索引”变为“全部”。我想避免这种情况,因为我的 blob 表约为数十 GB。我怎样才能避免它?

这是基本的内部联接:

EXPLAIN SELECT m.id, b.id, b.data 
        FROM metadata m, blobstore b 
        WHERE m.fkBlob = b.id;

1, 'SIMPLE', 'm', 'index', 'fk_blob', 'fk_blob', '4', '', 1, 'Using index'
1, 'SIMPLE', 'b', 'eq_ref', 'PRIMARY', 'PRIMARY', '4', 'blob_index.m.fkBlob', 1, ''

这里我在主表上添加了一个条件:

EXPLAIN SELECT m.id, b.id, b.data 
        FROM metadata m, blobstore b 
        WHERE m.fkBlob = b.id AND m.start < '2009-01-01';
1, 'SIMPLE', 'b', 'ALL', 'PRIMARY', '', '', '', 1, ''
1, 'SIMPLE', 'm', 'ref', 'fk_blob,index_start', 'fk_blob', '4', 'blob_index.b.id', 1, 'Using where'

请注意,表的列出顺序已更改。由于我添加了有关主表的条件,它现在正在对 blob 表进行全表扫描。

这是架构:

 DROP TABLE IF EXISTS `blob_index`.`metadata`;
    CREATE TABLE  `blob_index`.`metadata` (
      `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
      `fkBlob` int(10) unsigned NOT NULL,
      `start` datetime NOT NULL,
      PRIMARY KEY (`id`),
      KEY `fk_blob` (`fkBlob`),
      KEY `index_start` (`start`),
      CONSTRAINT `fk_blob` FOREIGN KEY (`fkBlob`) REFERENCES `blobstore` (`id`)
    ) ENGINE=InnoDB DEFAULT CHARSET=latin1;


    DROP TABLE IF EXISTS `blob_index`.`blobstore`;
    CREATE TABLE  `blob_index`.`blobstore` (
      `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
      `data` mediumblob NOT NULL,
      PRIMARY KEY (`id`)
    ) ENGINE=InnoDB DEFAULT CHARSET=latin1;

I have a table that has a foreign key to a table that stores some blob data. When I do an inner join on the tables with a condition on the main table the join type goes from 'index' to 'ALL'. I would like to avoid this as my blob table is on the order of tens of gigabytes. How can I avoid it?

Here is the the basic inner join:

EXPLAIN SELECT m.id, b.id, b.data 
        FROM metadata m, blobstore b 
        WHERE m.fkBlob = b.id;

1, 'SIMPLE', 'm', 'index', 'fk_blob', 'fk_blob', '4', '', 1, 'Using index'
1, 'SIMPLE', 'b', 'eq_ref', 'PRIMARY', 'PRIMARY', '4', 'blob_index.m.fkBlob', 1, ''

Here I add a condition on the main table:

EXPLAIN SELECT m.id, b.id, b.data 
        FROM metadata m, blobstore b 
        WHERE m.fkBlob = b.id AND m.start < '2009-01-01';
1, 'SIMPLE', 'b', 'ALL', 'PRIMARY', '', '', '', 1, ''
1, 'SIMPLE', 'm', 'ref', 'fk_blob,index_start', 'fk_blob', '4', 'blob_index.b.id', 1, 'Using where'

Notice that the order in which the tables are listed has changed. It is now doing a full table scan on the blob table because of a condition I've added regarding the main table.

Here is the schema:

 DROP TABLE IF EXISTS `blob_index`.`metadata`;
    CREATE TABLE  `blob_index`.`metadata` (
      `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
      `fkBlob` int(10) unsigned NOT NULL,
      `start` datetime NOT NULL,
      PRIMARY KEY (`id`),
      KEY `fk_blob` (`fkBlob`),
      KEY `index_start` (`start`),
      CONSTRAINT `fk_blob` FOREIGN KEY (`fkBlob`) REFERENCES `blobstore` (`id`)
    ) ENGINE=InnoDB DEFAULT CHARSET=latin1;


    DROP TABLE IF EXISTS `blob_index`.`blobstore`;
    CREATE TABLE  `blob_index`.`blobstore` (
      `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
      `data` mediumblob NOT NULL,
      PRIMARY KEY (`id`)
    ) ENGINE=InnoDB DEFAULT CHARSET=latin1;

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

脱离于你 2024-08-23 02:46:49

我猜你正在空表上尝试这个(因为MySQL认为它需要遍历一行来进行全表扫描),这可能会影响调度程序的结果。当您在真实的桌子上执行此操作时,EXPLAIN 结果可能会有所不同(实际上在我的测试中确实有所不同)。

I guess you are trying this on empty table (because MySQL thinks it needs to go through one row to do full table scan), what might influence results of scheduler. When you will do it on real table, the EXPLAIN results might vary (and actually did vary in my test).

日记撕了你也走了 2024-08-23 02:46:49

优化器认为您的查询将从交换表顺序中受益(这很可能意味着统计信息不是最新的)。

您可以尝试在元数据(start,fkBlob)上添加索引:

CREATE INDEX ix_metadata_start_blob ON metadata (start, fkBlob)

并在两个表上运行ANALYZE TABLE

这样,start 上的索引将用于过滤将成为前导的metadata

您还可以显式强制 join: 的顺序

SELECT  *
FROM    metadata m
STRAIGHT_JOIN
        blobstore b
ON      b.id = m.fkBlob
WHERE   m.start <= '2009-01-01'

,但通常不建议这样做。

The optimizer thinks you query will benefit from swapping the table order (which most probably means that statistics are not up to date).

You can try adding an index on metadata (start, fkBlob):

CREATE INDEX ix_metadata_start_blob ON metadata (start, fkBlob)

and run ANALYZE TABLE on both tables.

This way, the index on start will be used for filtering on metadata which will be made leading.

You can also explicitly force the order of the join:

SELECT  *
FROM    metadata m
STRAIGHT_JOIN
        blobstore b
ON      b.id = m.fkBlob
WHERE   m.start <= '2009-01-01'

, though it's usually not recommended.

貪欢 2024-08-23 02:46:49

如果我正确阅读了您发布的内容,它会从 indexrefeq_refall

CREATE INDEX idx_metadata USING BTREE ON `metadata` (fkBlob,start);

应该马上拿回来。

If I read what you posted right it goes from index to ref and eq_ref to all.

CREATE INDEX idx_metadata USING BTREE ON `metadata` (fkBlob,start);

Should take it right back.

听风念你 2024-08-23 02:46:49
if the index doesnot take it right use HINTS

select /* INDEX <index_name> */
blah blah blah
from ........
if the index doesnot take it right use HINTS

select /* INDEX <index_name> */
blah blah blah
from ........
不必在意 2024-08-23 02:46:49

在第一个示例中,MySQL 使用元数据 fk_blob 索引,因为它是覆盖索引——查询中使用的每一列都存在于索引中。 (这就是“使用索引”的含义。)该查询仍然执行完整扫描,但它通过辅助索引而不是主索引扫描每一行。一旦你使用start,你就失去了覆盖索引,MySQL计算出使用blobstore作为驱动索引会更快。 (InnoDB 的主索引与行存储集成在一起。)

如果您希望 MySQL 继续使用元数据索引作为驱动索引,请确保其上有一个对查询有用的索引。 (start, fkBlob) 上的索引最适合第二个查询,但对于其他查询可能没有用。下一个最佳索引是将 (fkBlob) 替换为 (fkBlob, start)。您必须在拥有太多索引(维护成本高昂)与拥有高效查询计划之间取得平衡。测试、测试、测试——永远不要盲目相信你的开发数据库上的解释。

In the first example, MySQL used the metadata fk_blob index because it was a covering index--every column you used in the query was present in the index. (This is what "using index" means.) That query still did a full scan, but it scanned every row via the secondary index instead of the primary. As soon as you used start, you lost the covering index and MySQL calculated it was faster to use blobstore as the driving index. (InnoDB's primary index is integrated with the row storage.)

If you want MySQL to continue to use a metadata index as the driving index, make sure there's a single index on it that will be useful for the query. An index on (start, fkBlob) would be best for the second query, but that may not be useful for other queries. The next best index is to replace (fkBlob) with (fkBlob, start). You'll have to balance having too many indexes (which are expensive to maintain) vs having efficient query plans. Test, test, test--and never blindly believe explain on your dev database.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文