MySql 和内联 SELECT

发布于 2024-08-14 05:15:40 字数 877 浏览 6 评论 0原文

我有一个如下所示的查询：

select
id
, int1
, int2
, (select count(*) from big_table_with_millions_of_rows 
    where id between t.int1 and t.int2)
from myTable t
where
....

此选择仅返回一行。内联选择中使用的 id 是索引列（主键）。如果我将 t.int1 和 t.int2 替换为该单行返回的 int1/int2 值，则查询将在几毫秒内完成。如果我执行上面的查询 - 即引用 int1/int2，大约需要 10 分钟。当我运行探查器并查看实际发生的情况时，我发现 99% 的时间引擎都忙于从内联查询返回数据。看起来好像 MySql 实际上

select ... from big_table_with_millions_of_rows

在将内联查询的位应用于

where id between t.int1 and t.int2

结果之前运行了一次。这是真的吗？如果不是，那是怎么回事？我一直认为内联 SELECT 具有潜在的危险，因为它们作为查询的最后一个元素逐行执行，但对于像这样的情况，初始的 SELECT code> 确实是高度选择性的，它可以非常高效。任何人都可以阐明这一点吗？

编辑：感谢您迄今为止的反馈。我关心的并不是内联查询的逐行性质，而是当面对变量而不是（相同的）硬编码值时，它似乎无法使用主键索引。我的猜测是，如果最近没有运行 ANALYZE，那么优化器会假设它必须执行表扫描，因为它不了解数据分布。但是范围查找是在主键上完成的这一事实不应该弥补这一点吗？

原文

I have a query that looks like this:

select
id
, int1
, int2
, (select count(*) from big_table_with_millions_of_rows 
    where id between t.int1 and t.int2)
from myTable t
where
....

This select returns exactly one row. The id used in the inline select is an indexed column (primary key). If I replace t.int1 and t.int2 with the values of int1/int2 returned by this single row, the query completes in milliseconds. If I execute the query as above - i.e. with references to int1/int2, it takes about 10 minutes. When I run profiler and look at what actually happens, I see that 99% of the time the engine is busy returning data from the inline query. It looks as though MySql is actually running the

select ... from big_table_with_millions_of_rows

bit of the inline query once before applying the

where id between t.int1 and t.int2

bit to the result. Can this be true? If not, then what is going on? I had always thought that inline SELECTs were potentially hazardous because they are executed row-by-row as the last element of the query, but for situations like this, where the initial SELECT is indeed highly selective, it can be very efficient. Can anyone shed any light on this?

EDIT: thanks for the feedback so far. My concern is not so much about the row-by-row nature of the inline query, but rather the fact that it seems unable to use the primary key index when faced with variables rather than (the same) hardcoded values. My guess would be that if ANALYZE has not been run recently, then the optimizer assumes it has to do a table scan as it has no knowledge about the data distribution. But shouldn't the fact that the range lookup is done on the primary key not compensate for that?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

一曲琵琶半遮面シ 2024-08-21 05:15:40

如果相关子查询没有得到很好的优化，那么尝试这个查询：

select
  t.id
, t.int1
, t.int2
, count(*)
from myTable t
left outer join big_table_with_millions_of_rows b
  on (b.id between t.int1 and t.int2)
where
....
group by t.id

这应该优化得更好。

回复您更新的问题：是的，就优化而言，MySQL 并不是市场上最复杂的 RDBMS。当 MySQL 无法优化像这样的极端情况时，请不要感到惊讶。

我是 MySQL 的粉丝，因为它易于使用、开源以及所有这些好东西，但事实是，它的竞争对手在技术方面远远领先于 MySQL。每个 RDBMS 都有一些“盲点”，但 MySQL 的盲点似乎更大。

还要确保您使用的是最新版本的 MySQL。他们在每个版本中改进了优化器，因此您可能会使用新版本获得更好的结果。

If the correlated subquery isn't optimized well, then try this query:

select
  t.id
, t.int1
, t.int2
, count(*)
from myTable t
left outer join big_table_with_millions_of_rows b
  on (b.id between t.int1 and t.int2)
where
....
group by t.id

That should optimize much better.

Re your updated question: Right, MySQL is not the most sophisticated RDBMS on the market in terms of optimization. Don't be surprised when MySQL can't optimize corner cases like this.

I'm a fan of MySQL for its ease of use and open source and all those good things, but the truth is that its competitors are far ahead of MySQL in terms of technology. Every RDBMS has some "blind spots" but MySQL's seem to be larger.

Also be sure you're using the latest version of MySQL. They improve the optimizer in every release, so you might get better results with a newer version.

回复收藏 0 原文