将长时间运行的 SQL 查询拆分为多个较小的查询
我正在使用 SQL Server 2008 和 Java 6/Spring jdbc。
我们有一个记录数约为 6000 万条的表。
我们需要将整个表加载到内存中,但是在此表上触发 select * 需要几个小时才能完成。
所以我基本上如下分割查询
String query = " select * from TABLE where " ;
for(int i =0;i<10;i++){
StringBuilder builder = new StringBuilder(query).append(" (sk_table_id % 10) =").append(i);
service.submit(new ParallelCacheBuilder(builder.toString(),namedParameters,jdbcTemplate));
}
,我通过在主键列上添加一个 where 条件来分割查询,
上面的代码片段将查询分割为 10 个并行运行的查询。这使用 java 的 ExecutorCompletionService。
我不是 SQL 专家,但我想上面的查询需要在主列上应用模运算符之前在内存中加载相同的数据。
这是好/坏/最好/最差的方式吗?还有其他方法吗,欢迎留言。
提前致谢!!!
I am using SQL Server 2008 and Java 6 / Spring jdbc.
We have a table with records count ~60mn.
We need to load this entire table into memory, but firing select * on this table takes hours to complete.
So I am splitting the query as below
String query = " select * from TABLE where " ;
for(int i =0;i<10;i++){
StringBuilder builder = new StringBuilder(query).append(" (sk_table_id % 10) =").append(i);
service.submit(new ParallelCacheBuilder(builder.toString(),namedParameters,jdbcTemplate));
}
basically, I am splitting the query by adding a where condition on primary key column,
above code snippet splits the query into 10 queries running in parallel.this uses java's ExecutorCompletionService.
I am not a SQL expert, but I guess above queries will need to load same data in memory before applyinh modulo operator on primary column.
Is this good/ bad/ best/worst way? Is there any other way, please post.
Thanks in advance!!!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
如果您确实需要内存中的所有 60M 记录,
select * from ...
是最快的方法。是的,这是全面扫描;没有办法。它受磁盘限制,因此多线程对您没有任何帮助。没有足够的可用内存(交换)会立即降低性能。需要大量时间来扩展的数据结构也会影响性能。打开任务管理器,查看CPU占用情况;可能很少;如果没有,请分析您的代码或仅注释掉除读取循环之外的所有内容。或者它可能是 SQL 服务器和您的计算机之间的网络瓶颈。
也许SQL Server 可以使用一些内部路径更快地将数据卸载到已知格式的外部转储文件(例如Oracle 可以)。我会探索将表转储到文件中然后使用 C# 解析该文件的可能性;它可能会更快,例如因为它不会干扰 SQL 服务器同时提供的其他查询。
If you do need all the 60M records in memory,
select * from ...
is the fastest approach. Yes, it's a full scan; there's no way around. It's disk-bound so multithreading won't help you any. Not having enough memory available (swapping) will kill performance instantly. Data structures that take significant time to expand will hamper performance, too.Open the Task Manager and see how much CPU is spent; probably little; if not, profile your code or just comment out everything but the reading loop. Or maybe it's a bottleneck in the network between the SQL server and your machine.
Maybe SQL Server can offload data faster to an external dump file of known format using some internal pathways (e.g. Oracle can). I'd explore the possibility of dumping a table into a file and then parsing that file with C#; it could be faster e.g. because it won't interfere with other queries that the SQL server is serving at the same time.