MySql 和内联 SELECT
我有一个如下所示的查询:
select
id
, int1
, int2
, (select count(*) from big_table_with_millions_of_rows
where id between t.int1 and t.int2)
from myTable t
where
....
此选择仅返回一行。内联选择中使用的 id 是索引列(主键)。如果我将 t.int1
和 t.int2
替换为该单行返回的 int1/int2 值,则查询将在几毫秒内完成。如果我执行上面的查询 - 即引用 int1/int2,大约需要 10 分钟。当我运行探查器并查看实际发生的情况时,我发现 99% 的时间引擎都忙于从内联查询返回数据。看起来好像 MySql 实际上
select ... from big_table_with_millions_of_rows
在将内联查询的位应用于
where id between t.int1 and t.int2
结果之前运行了一次。这是真的吗?如果不是,那是怎么回事?我一直认为内联 SELECT 具有潜在的危险,因为它们作为查询的最后一个元素逐行执行,但对于像这样的情况,初始的 SELECT code> 确实是高度选择性的,它可以非常高效。任何人都可以阐明这一点吗?
编辑:感谢您迄今为止的反馈。我关心的并不是内联查询的逐行性质,而是当面对变量而不是(相同的)硬编码值时,它似乎无法使用主键索引。我的猜测是,如果最近没有运行 ANALYZE,那么优化器会假设它必须执行表扫描,因为它不了解数据分布。但是范围查找是在主键上完成的这一事实不应该弥补这一点吗?
I have a query that looks like this:
select
id
, int1
, int2
, (select count(*) from big_table_with_millions_of_rows
where id between t.int1 and t.int2)
from myTable t
where
....
This select returns exactly one row. The id used in the inline select is an indexed column (primary key). If I replace t.int1
and t.int2
with the values of int1/int2 returned by this single row, the query completes in milliseconds. If I execute the query as above - i.e. with references to int1/int2, it takes about 10 minutes. When I run profiler and look at what actually happens, I see that 99% of the time the engine is busy returning data from the inline query. It looks as though MySql is actually running the
select ... from big_table_with_millions_of_rows
bit of the inline query once before applying the
where id between t.int1 and t.int2
bit to the result. Can this be true? If not, then what is going on? I had always thought that inline SELECT
s were potentially hazardous because they are executed row-by-row as the last element of the query, but for situations like this, where the initial SELECT
is indeed highly selective, it can be very efficient. Can anyone shed any light on this?
EDIT: thanks for the feedback so far. My concern is not so much about the row-by-row nature of the inline query, but rather the fact that it seems unable to use the primary key index when faced with variables rather than (the same) hardcoded values. My guess would be that if ANALYZE has not been run recently, then the optimizer assumes it has to do a table scan as it has no knowledge about the data distribution. But shouldn't the fact that the range lookup is done on the primary key not compensate for that?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
如果相关子查询没有得到很好的优化,那么尝试这个查询:
这应该优化得更好。
回复您更新的问题:是的,就优化而言,MySQL 并不是市场上最复杂的 RDBMS。当 MySQL 无法优化像这样的极端情况时,请不要感到惊讶。
我是 MySQL 的粉丝,因为它易于使用、开源以及所有这些好东西,但事实是,它的竞争对手在技术方面远远领先于 MySQL。每个 RDBMS 都有一些“盲点”,但 MySQL 的盲点似乎更大。
还要确保您使用的是最新版本的 MySQL。他们在每个版本中改进了优化器,因此您可能会使用新版本获得更好的结果。
If the correlated subquery isn't optimized well, then try this query:
That should optimize much better.
Re your updated question: Right, MySQL is not the most sophisticated RDBMS on the market in terms of optimization. Don't be surprised when MySQL can't optimize corner cases like this.
I'm a fan of MySQL for its ease of use and open source and all those good things, but the truth is that its competitors are far ahead of MySQL in terms of technology. Every RDBMS has some "blind spots" but MySQL's seem to be larger.
Also be sure you're using the latest version of MySQL. They improve the optimizer in every release, so you might get better results with a newer version.
如果可以的话,尝试使用 JOIN 来避免相关子查询。
在 youtube 上观看有关 MySQL 性能的精彩视频。转到 31:00 分钟。演讲者 Jay Pipes 谈论如何避免相关子查询。
Try to avoid correlated subqueries by using JOIN if you can.
Watch this great video on MySQL performance on youtube. Go to 31:00 minute in. The speaker Jay Pipes talks about avoiding correlated subqueries.
如果子查询引用其包含查询中的字段,则必须针对包含查询中的每一行重新运行子查询,因为每一行中引用的字段可能不同。如果它是完全独立的,则可以在外部查询开始处理之前运行一次。
If a subquery references fields from its containing query, the subquery has to be rerun per every row in the containing query, because the referenced fields may be different in each row. If it's completely self-contained, it can be run a single time before the outer query begins processing.