MySQL:使用 DISTINCT 优化查询

发布于 2024-12-02 14:02:02 字数 400 浏览 4 评论 0原文

在我的 Java 应用程序中,我发现了一个小性能问题,这是由如此简单的查询引起的:

 SELECT DISTINCT a 
   FROM table 
  WHERE checked = 0 
  LIMIT 10000

我在 checked 列上有索引。

一开始,查询速度非常快(即几乎所有行都已检查= 0)。但当我将越来越多的行标记为选中时,查询变得非常低效(最多几分钟)。

如何提高该查询的性能?我应该添加一个复杂的索引

  • a,checked

还是

  • checked,a?

我的表有数百万行,这就是为什么我不想手动测试它并希望有幸运的猜测。

In my Java application I have found a small performance issue, which is caused by such simple query:

 SELECT DISTINCT a 
   FROM table 
  WHERE checked = 0 
  LIMIT 10000

I have index on the checked column.

In the beginning, the query is very fast (i.e. where almost all rows have checked = 0). But as I mark more and more rows as checked, the query becomes greatly inefficient (up to several minutes).

How can I improve the performance of this query ? Should I add a complex index

  • a, checked

or rather

  • checked, a?

My table has a lot of millions of rows, that is why I do not want to test it manually and hope to have lucky guess.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

他是夢罘是命 2024-12-09 14:02:02

我会在 checked, a 上添加一个索引。这意味着您返回的值已在索引中找到,无需重新访问表来查找它。其次,如果您要对表进行大量单独更新,则表和索引很可能在磁盘上都变得碎片化。重建(压缩)表和索引可以显着提高性能。

您还可以使用重写为的查询(以防优化器不理解它是等效的):

  SELECT a 
    FROM table 
   WHERE checked = 0 
GROUP BY a
   LIMIT 10000

I would add an index on checked, a. This means that the value you're returning has already been found in the index and there's no need to re-access the table to find it. Secondly if you're doing lot's of individual updates of the table there's a good chance both the table and the index have become fragmented on the disc. Rebuilding (compacting) a table and index can significantly increase performance.

You can also use the query rewritten as (just in case the optimizer does not understand that it's equivalent):

  SELECT a 
    FROM table 
   WHERE checked = 0 
GROUP BY a
   LIMIT 10000
何其悲哀 2024-12-09 14:02:02

在 DISTINCT 列上添加复合索引(在本例中为 a)。 MySQL 能够将此索引用于 DISTINCT。

MySQL 还可以利用 (a,checked) 上的复合索引(顺序很重要,DISTINCT 列必须位于索引的开头)。尝试两者并将结果与​​您的数据和查询进行比较。

(添加此索引后,您应该在 EXPLAIN 输出中看到 Using index for group-by。)

请参阅GROUP BY 优化在手册上。 (DISTINCTGROUP BY 非常相似。)

处理 GROUP BY 的最有效方法是使用索引直接检索分组列。通过这种访问方法,MySQL 使用了某些索引类型的键有序的属性(例如 BTREE)。此属性允许在索引中使用查找组,而无需考虑索引中满足所有 WHERE 条件的所有键。>

Add a compound index on the DISTINCT column (a in this case). MySQL is able to use this index for the DISTINCT.

MySQL may also take profit of a compound index on (a, checked) (the order matters, the DISTINCT column has to be at the start of the index). Try both and compare the results with your data and your queries.

(After adding this index you should see Using index for group-by in the EXPLAIN output.)

See GROUP BY optimization on the manual. (A DISTINCT is very similar to a GROUP BY.)

The most efficient way to process GROUP BY is when an index is used to directly retrieve the grouping columns. With this access method, MySQL uses the property of some index types that the keys are ordered (for example, BTREE). This property enables use of lookup groups in an index without having to consider all keys in the index that satisfy all WHERE conditions.>

半寸时光 2024-12-09 14:02:02

我的表有数百万行<...>几乎所有行都有
检查=0

在这种情况下,最好的索引似乎是一个简单的 (a)

更新:

尚不清楚检查了多少行。从你的评论下面的问题:

一开始 0 占 100% 行,但在一天结束时,它会
变成0%

这改变了一切。所以@Ben有正确的答案。

My table has a lot of millions of rows <...> where almost all rows have
checked=0

In this case it seems that the best index would be a simple (a).

UPDATE:

It was not clear how many rows get checked. From your comment bellow the question:

At the beginning 0 is in 100% rows, but at the end of the day it will
become 0%

This changes everything. So @Ben has the correct answer.

心凉怎暖 2024-12-09 14:02:02

我找到了一个完全不同的解决方案,可以解决这个问题。我将简单地创建一个包含所有可能的唯一“a”值的新表。这将使我避免 DISTINCT

I have found a completely different solution which would do the trick. I will simple create a new table with all possible unique "a" values. This will allow me to avoid DISTINCT

橙味迷妹 2024-12-09 14:02:02

你没有说明,但你定期更新索引吗?随着底层数据发生变化,索引变得越来越不准确,处理也变得越来越糟糕。如果您有一个已检查的索引,并且已检查的索引会随着时间的推移而更新,您需要确保您的索引定期进行相应更新。

You don't state it, but are you updating the index regularly? As changes occur to the underlying data, the index becomes less and less accurate and processing gets worse and worse. If you have an index on checked, and checked is being updated over time, you need to make sure your index is updated accordingly on a regular basis.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文