一般来说,MySQL 或 SQL 中的 BETWEEN 和 IN 之间是否存在性能差异?
我想根据主键获取一组连续的行,主键是一个自动递增的整数。假设没有漏洞,在:
SELECT * FROM `theTable` WHERE `id` IN (n, ... nk);
和:之间是否有任何性能:
SELECT * FROM `theTable` WHERE `id` BETWEEN n AND nk;
I have a set of consecutive rows I want to get based upon their primary key, which is an auto-incrementing integer. Assuming that there are no holes, is there any performance between between:
SELECT * FROM `theTable` WHERE `id` IN (n, ... nk);
and:
SELECT * FROM `theTable` WHERE `id` BETWEEN n AND nk;
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
在这种情况下,
BETWEEN
应该优于IN
(但是也要测量和检查执行计划!),特别是当n
不断增长,并且统计数据仍然准确。假设:m
是表的大小n
是可以使用索引 (n
与n
相比很小>m)理论上,
BETWEEN
可以通过对主键索引进行一次“范围扫描”(Oracle的说法)来实现,然后最多遍历n
索引叶节点。复杂度为O(n + log m)
IN
通常实现为一系列(循环)n
“范围扫描" 在主键索引上。由于m
是表的大小,复杂度始终是O(n * log m)
...这总是更糟(对于非常小的表,neglibilem
或非常小的范围n
)无法使用索引(
n
是m
的重要部分)在任何情况下,您将获得全表扫描并评估每一行的谓词:
BETWEEN
需要评估两个谓词:一个用于下限,一个用于上限。复杂度为O(m)
IN
最多需要评估n
个谓词。复杂度为O(m * n)
...这又总是更糟糕,或者如果数据库可以优化IN
,则可能是O(m)
code> 列表是一个哈希图,而不是谓词列表。BETWEEN
should outperformIN
in this case (but do measure and check execution plans, too!), especially asn
grows and as statistics are still accurate. Let's assume:m
is the size of your tablen
is the size of your rangeIndex can be used (
n
is tiny compared tom
)In theory,
BETWEEN
can be implemented with a single "range scan" (Oracle speak) on the primary key index, and then traverse at mostn
index leaf nodes. The complexity will beO(n + log m)
IN
is usually implemented as a series (loop) ofn
"range scans" on the primary key index. Withm
being the size of the table, the complexity will always beO(n * log m)
... which is always worse (neglibile for very small tablesm
or very small rangesn
)Index cannot be used (
n
is a significant portion ofm
)In any case, you'll get a full table scan and evaluate the predicate on each row:
BETWEEN
needs to evaluate two predicates: One for the lower and one for the upper bound. The complexity isO(m)
IN
needs to evaluate at mostn
predicates. The complexity isO(m * n)
... which is again always worse, or perhapsO(m)
if the database can optimise theIN
list to be a hashmap, rather than a list of predicates.b 和 c 之间的 a
是一个扩展为b <= a 和 a <= c
的宏。a in (b,c,d)
是一个扩展为a=b 或 a=c 或 a=d
的宏。假设您的
n
和nk
是整数,两者最终的含义应该相同。after
变体应该快得多,因为它只有两次比较,而nk - n
则比较in
变体。a between b and c
is a macro that expands tob <= a and a <= c
.a in (b,c,d)
is a macro that expands toa=b or a=c or a=d
.Assuming your
n
andnk
are integer, both should end up meaning the same. Thebetween
variant should be much faster because it's only two compares, versusnk - n
compares for thein
variant.对于这个问题我曾经做过研究。
我的表中有 11M 行。我对此执行了两个查询:
查询 1:
SELECT * FROM PLAYERS WHERE SCORE BETWEEN 10 TO 20
查询 2:
SELECT * FROM PLAYERS WHERE SCORE IN (10,11,..., 20)
在执行时,两个查询都被翻译为上面所说的 Andomar 。
在这两个查询中,查询 1 的运行速度比查询 2 快。
要了解更多信息,请点击以下链接:
MySQL 中 BETWEEN VS IN() 的性能
谢谢。
I have done research for this question.
I have 11M rows in my table. I have executed two queries on that:
Query 1:
SELECT * FROM PLAYERS WHERE SCORE BETWEEN 10 TO 20
Query 2:
SELECT * FROM PLAYERS WHERE SCORE IN (10,11,...,20)
While execution time, both queries are translated as Andomar said above.
Among both queries, Query 1 is running faster than Query 2.
To know more follow this link:
Performance of BETWEEN VS IN() in MySQL
Thank you.
在许多数据库服务器中,IN() 只是多个 OR 子句的同义词,因为两者在逻辑上是等效的。 MySQL 中则不然,它对 IN() 列表中的值进行排序,并使用快速二分搜索来查看某个值是否在列表中。列表的大小为 O(Log n),而等效的一系列 OR 子句的列表大小为 O(n)(即,对于大型列表来说要慢得多)
In many database servers, IN() is just a synonym for multiple OR clauses, because the two are logically equivalent. Not so in MySQL, which sorts the values in the IN() list and uses a fast binary search to see whether a value is in the list. This is O(Log n) in the size of the list, whereas an equivalent series of OR clauses is O(n) in the size of the list (i.e., much slower for large lists)