SQL 性能:哪个更快? IN() 与 JOIN
这是一个我从未得到明确答案的问题。我在这个例子中使用 MySQL。
给定一组相当大的值(比方说 500)。使用这些值和 IN() 子句搜索表是否更快:
SELECT * FROM table WHERE field IN(values)
或者通过在内存中创建临时表,用值填充它并将其连接到正在搜索的表:
CREATE TEMPORARY TABLE `temp_table` (`field` varchar(255) NOT NULL) ENGINE=MyISAM DEFAULT CHARSET=latin1;
INSERT INTO temp_table VALUES (values)
SELECT * FROM table t1 JOIN temp_table t2 ON t1.field = t2.field
两种方法都会生成相同的结果集。
我自己做了一些基本的基准测试,发现当处理超过 500 个值时,使用临时表比使用 IN() 子句更快。
有人可以向我解释 MySQL 的内部工作原理以及这个问题的正确答案是什么吗?
谢谢, 狮子座
This is a question to which I've never gotten a definitive answer. I am using MySQL in this example.
Given a reasonably large set of values (lets say 500). Is it quicker to search a table using these values with an IN() clause:
SELECT * FROM table WHERE field IN(values)
Or by creating a temporary table in memory, populating it with the values and joining it to the table being searched:
CREATE TEMPORARY TABLE `temp_table` (`field` varchar(255) NOT NULL) ENGINE=MyISAM DEFAULT CHARSET=latin1;
INSERT INTO temp_table VALUES (values)
SELECT * FROM table t1 JOIN temp_table t2 ON t1.field = t2.field
Both methods will produce same result set.
I have done some of my own basic benchmarking tests and found that when dealing with more than 500 values it becomes quicker to use a temporary table than an IN() clause.
Can someone explain to me the inner workings of MySQL and what a proper answer to this question is?
Thanks,
Leo
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
来自MySql在线文档, IN() :
考虑到我认为将 IN() 与一组常量一起使用是有意义的,否则您应该在另一个表上使用子查询。
您可以考虑使用 EXISTS() 当从其他表检索项目时,使用 JOIN 代替 JOIN,对于大型数据集,它会明显更快
From MySql online documentation, IN() :
Considering that I believe it makes sense to use IN() with a set of constants, otherwise you should use subquery on another table.
You can consider usign EXISTS() instead of JOIN when items are retrieved from an other table, it would significantly faster for large data set
正确答案取决于很多因素。
您已经完成了工作 - 如果您的基准测试告诉您使用临时表更快,那么就应该这样做。
如果您更改硬件或显着更改架构,请记住再次进行基准测试。
The correct answer depends on many things.
You've done the work already - if your benchmarking tells you that using a temporary table is faster, then that's the way to go.
Remember to benchmark again if you change the hardware, or dramatically alter the schema.