在 MySQL 端使用复杂的布尔值来限制 ResultSet 还是在 Java 端更快?
假设我有一个非常大的表,其中填充了大量数据(例如,数据足以容纳不下内存),并且我想分析行的子集。
通常,执行以下操作更快:
SELECT (column1, column2, ... , columnN) FROM table WHERE (some complicated boolean clause);
然后使用 ResultSet,还是执行以下操作更快:
SELECT (column1, column2, ... , columnN) FROM table;
然后迭代结果集,根据布尔条件的 java 版本接受不同的行?
我认为这取决于 Java 迭代器/布尔求值器是否比 MySQL 布尔求值器更快。
Lets say I have a really big table filled with lots of data (say, enough not to fit comfortably in memory), and I want to analyze a subset of the rows.
Is it generally faster to do:
SELECT (column1, column2, ... , columnN) FROM table WHERE (some complicated boolean clause);
and then use the ResultSet, or is it faster to do:
SELECT (column1, column2, ... , columnN) FROM table;
and then iterate over the ResultSet, accepting different rows based on a java version of your boolean condition?
I think it comes down to whether the Java iterator/boolean evaluator is faster than the MySQL boolean evaluator.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
几乎可以肯定,将条件发送到数据库会更快。
It is almost certainly faster to send the condition to the database.
不会。几乎可以肯定,决定因素是必须通过网络传输的数据量(以及各种开销)。 99% 的情况下,减少数据库服务器上的结果集大小是正确的做法。在复杂查询中尤其如此,这可能会导致较小的连接。
No. The deciding factor will almost certainly be the amount of data that has to be transported over the network (and assorted overhead). Reducing the result set size on the DB server is the right thing to do 99% of the time. This is especially true in complex queries where it could lead to smaller joins.
一般来说,数据库获胜。您几乎肯定会遇到这种情况。如果您想确定的话,请对其进行分析。我在其他语言中遇到过这样的情况:传输大量数据的开销被以下事实所抵消:某些处理可以在数据库之外比在数据库中更快地完成。如果您正在评估的布尔条件用关系术语表达起来极其复杂,那么您可能会发现在 Java 中评估它的好处,但这种可能性极小。
As a general rule, the database wins. That will almost certainly be the case for you. If you want to be sure though, profile it. I have run into cases in other languages where the overhead of transferring a lot of data was offset by the fact that some of the processing could be done outside of the DB much faster than in it. If the boolean condition you are evaluating is extremely complex to express in relational terms, you could see a benefit in evaluating it in Java, but it is extremely unlikely.
该数据库旨在优化您的任务。你的语言不是。与正在执行其他操作的工作站相比,数据库可能拥有更好的缓存资源来防止磁盘操作。
这有点像询问您是否应该先将数据下载到 Excel 中,并且数据量大于 Excel 在内存中可以容纳的大小。
The database was designed to optimize your task. Your language wasn't. And the database probably has better caching resources to prevent disk operations than does your workstation with everything else it's doing.
This is a little like asking whether you should download the data into Excel first, with a datawad bigger than Excel can hold in memory.