较大的数据集时,在MySQL中未使用索引

发布于 2025-02-11 15:15:55 字数 1251 浏览 2 评论 0原文

我的查询看起来像是这样的

EXPLAIN SELECT 
    r.owner_id, 
    r.owner_address, 
    r.owner_platform,
    r.updated_at 
FROM some_owner_table as r 
WHERE 
    r.updated_at > '2022-09-16 22:16:38.832' 
ORDER BY 
    r.updated_at DESC LIMIT 200;

结果是

# id, select_type, table, partitions, type, possible_keys, key, key_len, ref, rows, filtered, Extra
'1', 'SIMPLE', 'r', NULL, 'range', 'INDEX_by_updated_at', 'INDEX_by_updated_at', '6', NULL, '1', '100.00', 'Using index condition'

,如果我们使用不同的日期,我认为增加了我们获得的结果数:

'1', 'SIMPLE', 'r', NULL, 'ALL', 'INDEX_by_updated_at', NULL, NULL, NULL, '263', '37.64', 'Using where; Using filesort'

使用filesort在性能方面似乎有问题。不再使用使用索引条件

这是索引的真正起作用,还是我们可以做一些事情来进一步优化该表的查询?

编辑:表有263行。

编辑:创建查询:

CREATE TABLE `some_owner_table` (
  `owner_id` bigint(20) NOT NULL,
  `owner_address` bigint(20) NOT NULL,
  `owner_platform` int(11) NOT NULL,
  `updated_at` timestamp(3) NOT NULL DEFAULT CURRENT_TIMESTAMP(3) ON UPDATE CURRENT_TIMESTAMP(3),
  PRIMARY KEY (`owner_id`,`owner_platform`),
  KEY `INDEX_by_updated_at` (`updated_at`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

My Query looks like this

EXPLAIN SELECT 
    r.owner_id, 
    r.owner_address, 
    r.owner_platform,
    r.updated_at 
FROM some_owner_table as r 
WHERE 
    r.updated_at > '2022-09-16 22:16:38.832' 
ORDER BY 
    r.updated_at DESC LIMIT 200;

The result is

# id, select_type, table, partitions, type, possible_keys, key, key_len, ref, rows, filtered, Extra
'1', 'SIMPLE', 'r', NULL, 'range', 'INDEX_by_updated_at', 'INDEX_by_updated_at', '6', NULL, '1', '100.00', 'Using index condition'

However if we use a different date that, I think, increased the number of results we get :

'1', 'SIMPLE', 'r', NULL, 'ALL', 'INDEX_by_updated_at', NULL, NULL, NULL, '263', '37.64', 'Using where; Using filesort'

Using filesort seems problematic in terms of performance. It's not longer using Using index condition.

Is this how indexing really works or can we do something to further optimize our queries for this table?

EDIT: Table has 263 total rows.

EDIT: Create query:

CREATE TABLE `some_owner_table` (
  `owner_id` bigint(20) NOT NULL,
  `owner_address` bigint(20) NOT NULL,
  `owner_platform` int(11) NOT NULL,
  `updated_at` timestamp(3) NOT NULL DEFAULT CURRENT_TIMESTAMP(3) ON UPDATE CURRENT_TIMESTAMP(3),
  PRIMARY KEY (`owner_id`,`owner_platform`),
  KEY `INDEX_by_updated_at` (`updated_at`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

浪漫之都 2025-02-18 15:15:55

MySQL的优化器通常选择使用索引跳过,如果可以推断您的病情与表格足够大的部分相匹配。根据我的经验,阈值约为20%,但这不是官方功能,如果MySQL Server代码更改,则可能会有所不同。

原因是进行索引查找实际上需要更多的工作,然后从该索引条目获取整个行。优化器可以假定,只要按主键顺序行行行走,并保持与条件匹配的行是更经济的。

但是,如果要搜索的特定值发生在一小部分行中,那么选择几个索引条目,然后获取相应的行更经济。

如果您认为优化器做出了错误的选择,则可以使用 Force Index提示,它使优化器将桌子视为无限成本,因此,如果该索引完全相关,则将使用它。

例如,避免filesort可能值得。也就是说,如果您强迫查询通过在上读取索引来扫描行 ,则分类将变为no-op,它将避免使用文件。

另一个想法:如果您正在测试一个很小的数据集,则优化器可以认为使用索引并不重要,因为无论如何,排在RAM中的少数页面中,并且成本是搜索或排序将是微不足道的。

这就是为什么您应该使用较大样本量测试优化的原因,因为优化器可能对琐碎的数据集做出不同的选择。

MySQL's optimizer generally chooses to skip using the index if can infer that your condition would match a large enough portion of the table. In my experience, the threshold is about 20%, but this is not an official feature, and it may be different if the MySQL Server code changes.

The reason is that it actually takes more work per row to do an index lookup, then from that index entry fetch the whole row. The optimizer may assume that there's a point at which it's more economical to just walk the table row by row in primary key order, and keep the rows that match the condition.

But if the specific value you're searching for occurs on a small subset of the rows, then it's more economical to select those few index entries, then fetch the corresponding rows.

If you think the optimizer has made the wrong choice, you can use the FORCE INDEX hint, which makes the optimizer treat a table-scan as infinitely costly, so if the index is relevant at all, it'll use it.

It might be worthwhile, for example, to avoid the filesort. That is, if you force the query to scan rows by reading the index on updated_at, then sorting becomes a no-op, and it will avoid the filesort.

Another idea: If you're testing a very small dataset, the optimizer could reason that using an index doesn't matter, because the number of rows is going to fit in such a small number of pages in RAM anyway, and the cost of searching or sorting will be trivial.

This is why you should test optimization with a larger sample size, because the optimizer might make different choices for a trivially-sized dataset.

此生挚爱伱 2025-02-18 15:15:55

MySQL / Mariadb中的查询计划者对是否使用索引做出了基于成本的选择。过度简化,它猜测CPU和IO时间需要在不使用索引的情况下满足查询需要多少时间,并选择最小的猜测。就像@billkarwin所说的那样,对于小(数百行)和大(数百万)桌子来说,这些选择可能会大不相同。因此,不要浪费大量时间在小桌子上解释查询。

您有col>查询中的COL DESC限制200模式的恒定顺序。当您的查询不使用索引时,服务器大致完成了这些事情。

  • 扫描逐行行,寻找与Where子句相匹配的行。
  • 将这些行放入内部数据结构中,这是一种轻巧的表。当说明输出中提到时,该表称为“文件”。这不是文件系统文件。
  • 完成表扫描后,它按顺序指定的“文件”为“文件”。
  • 最后,它向您发送了从排序的“文件”中向您发送前200行,并丢弃其余的。

但是您的列上有一个索引。当它使用索引时,服务器可以更有效地工作。它

  • 随机访问第一行匹配的索引。
  • 通过索引顺序扫描,直到最后一个匹配的行
  • 检索并将每个匹配行发送给您时
  • ,并在发送200行时停止。

无需分类。 (HOORAY。)

如果您提到的查询必须比生产更快,则可以尝试按照这样的降序重新创建索引:

ALTER TABLE some_owner_table 
     DROP KEY INDEX_by_updated_at,
      ADD KEY INDEX_by_updated_at 
          (updated_at DESC);

这将您的索引降低顺序,通过子句完全匹配您的订单。向前扫描索引而不是向后扫描索引的价格略低,而修订后的索引可以实现这种情况。

如果仍然不够快,您可以尝试创建覆盖索引。覆盖索引允许服务器从索引完全满足查询,而不是从索引到表*查找每个行的数据。看起来像这样。

ALTER TABLE some_owner_table 
     DROP KEY INDEX_by_updated_at,
      ADD KEY INDEX_by_updated_at 
           (updated_at DESC, owner_id, owner_address, owner_platform);

这样的索引需要表空间(SSD或磁盘空间),并稍微放慢了插入和更新,因为服务器必须更新索引和表。但是表空间很便宜。而且大多数桌子的插入或更新几乎没有选择。

*在Innodb中,该表实际上存储在主键内。它称为a clustered index

The query planner in MySQL / MariaDB makes cost-based choices about whether to use indexes. Oversimplifying, it guesses how much CPU and IO time it will take to satisfy the query with or without using indexes, and chooses the smallest guess. Like @BillKarwin said, those choices can be very different for small (hundreds of rows) and large (millions) tables. So don't waste a whole ton of your time EXPLAINing queries on small tables.

You have the WHERE col > constant ORDER BY col DESC LIMIT 200 pattern in your query. When your query doesn't use an index the server does these things, roughly.

  • scans the table row-by-row looking for rows matching the WHERE clause.
  • places those rows into an internal data structure, a sort of lightweight table. That table, when mentioned in EXPLAIN output, is called a "file". It's not a file-system file.
  • when the table scan is done, it sorts that "file" as specified by ORDER BY.
  • finally, it sends you the first 200 rows from the sorted "file" and discards the rest.

But you have an index on the column in question. When it uses the index the server can work more efficiently. It

  • random-accesses the index for the first row matching the WHERE clause.
  • scans sequentially through the index until the last matching row
  • retrieves and sends you each matching row
  • and stops when it has sent 200 rows.

No sorting needed. (Hooray.)

If the query you mentioned has to be faster than it is in production, you can try recreating your index in descending order like this:

ALTER TABLE some_owner_table 
     DROP KEY INDEX_by_updated_at,
      ADD KEY INDEX_by_updated_at 
          (updated_at DESC);

This puts your index in descending order, matching your ORDER BY clause exactly. It is slightly less expensive to scan an index forward rather than backward, and this revised index lets that happen.

If it's still not fast enough, you can try creating a covering index. A covering index allows the server to satisfy the query completely from the index, rather than going from the index to the table* to look up every row's data. That looks like this.

ALTER TABLE some_owner_table 
     DROP KEY INDEX_by_updated_at,
      ADD KEY INDEX_by_updated_at 
           (updated_at DESC, owner_id, owner_address, owner_platform);

An index like this takes tablespace (SSD or disk space) and slightly slows down INSERTs and UPDATEs, because the server must update the index as well as the table. But tablespace is cheap. And most tables don't get nearly as many INSERTs or UPDATEs as they do SELECTs.

* In InnoDB, the table is actually stored inside the primary key. It's called a clustered index.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文