MySQL:使用 DISTINCT 优化查询
在我的 Java 应用程序中,我发现了一个小性能问题,这是由如此简单的查询引起的:
SELECT DISTINCT a
FROM table
WHERE checked = 0
LIMIT 10000
我在 checked
列上有索引。
一开始,查询速度非常快(即几乎所有行都已检查= 0)。但当我将越来越多的行标记为选中时,查询变得非常低效(最多几分钟)。
如何提高该查询的性能?我应该添加一个复杂的索引
- a,checked
还是
- checked,a?
我的表有数百万行,这就是为什么我不想手动测试它并希望有幸运的猜测。
In my Java application I have found a small performance issue, which is caused by such simple query:
SELECT DISTINCT a
FROM table
WHERE checked = 0
LIMIT 10000
I have index on the checked
column.
In the beginning, the query is very fast (i.e. where almost all rows have checked
= 0). But as I mark more and more rows as checked, the query becomes greatly inefficient (up to several minutes).
How can I improve the performance of this query ? Should I add a complex index
- a, checked
or rather
- checked, a?
My table has a lot of millions of rows, that is why I do not want to test it manually and hope to have lucky guess.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
我会在
checked, a
上添加一个索引。这意味着您返回的值已在索引中找到,无需重新访问表来查找它。其次,如果您要对表进行大量单独更新,则表和索引很可能在磁盘上都变得碎片化。重建(压缩)表和索引可以显着提高性能。您还可以使用重写为的查询(以防优化器不理解它是等效的):
I would add an index on
checked, a
. This means that the value you're returning has already been found in the index and there's no need to re-access the table to find it. Secondly if you're doing lot's of individual updates of the table there's a good chance both the table and the index have become fragmented on the disc. Rebuilding (compacting) a table and index can significantly increase performance.You can also use the query rewritten as (just in case the optimizer does not understand that it's equivalent):
在 DISTINCT 列上添加复合索引(在本例中为
a
)。 MySQL 能够将此索引用于 DISTINCT。MySQL 还可以利用
(a,checked)
上的复合索引(顺序很重要,DISTINCT 列必须位于索引的开头)。尝试两者并将结果与您的数据和查询进行比较。(添加此索引后,您应该在
EXPLAIN
输出中看到Using index for group-by
。)请参阅GROUP BY 优化在手册上。 (
DISTINCT
与GROUP BY
非常相似。)Add a compound index on the DISTINCT column (
a
in this case). MySQL is able to use this index for theDISTINCT
.MySQL may also take profit of a compound index on
(a, checked)
(the order matters, the DISTINCT column has to be at the start of the index). Try both and compare the results with your data and your queries.(After adding this index you should see
Using index for group-by
in theEXPLAIN
output.)See GROUP BY optimization on the manual. (A
DISTINCT
is very similar to aGROUP BY
.)在这种情况下,最好的索引似乎是一个简单的
(a)
。更新:
尚不清楚检查了多少行。从你的评论下面的问题:
这改变了一切。所以@Ben有正确的答案。
In this case it seems that the best index would be a simple
(a)
.UPDATE:
It was not clear how many rows get checked. From your comment bellow the question:
This changes everything. So @Ben has the correct answer.
我找到了一个完全不同的解决方案,可以解决这个问题。我将简单地创建一个包含所有可能的唯一“a”值的新表。这将使我避免 DISTINCT
I have found a completely different solution which would do the trick. I will simple create a new table with all possible unique "a" values. This will allow me to avoid DISTINCT
你没有说明,但你定期更新索引吗?随着底层数据发生变化,索引变得越来越不准确,处理也变得越来越糟糕。如果您有一个已检查的索引,并且已检查的索引会随着时间的推移而更新,您需要确保您的索引定期进行相应更新。
You don't state it, but are you updating the index regularly? As changes occur to the underlying data, the index becomes less and less accurate and processing gets worse and worse. If you have an index on checked, and checked is being updated over time, you need to make sure your index is updated accordingly on a regular basis.