处理大型 SQL 选择查询/读取块中的 SQL 数据
我正在使用.Net 4.0 和 SQL Server 2008 R2。
我正在运行一个大型 SQL 选择查询,它返回数百万个结果,并且需要很长时间才能完全运行。
有谁知道如何只读取查询返回的部分结果,而不必等待整个查询完成?
换句话说,我想在查询仍在运行并获取下一个结果的同时读取前 10,000 个记录块。
I'm using .Net 4.0 and SQL server 2008 R2.
I'm running a big SQL select query which returns millions of results and takes up a long time to fully run.
Does anyone know how can I read only some of the results returned by the query without having to wait for the whole query to complete?
In other words, I want to read the first by 10,000 records chunks while the query still runs and getting the next results.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
这部分取决于查询本身是否是流式的,或者是否在临时表中完成大量工作,然后(最后)开始返回数据。在第二种情况下,除了重写查询之外,您不能做太多事情;然而,在第一种情况下,迭代器块通常会有所帮助,即,
现在这是一个流式迭代器 - 您可以对其进行
foreach
,它将从传入的 TDS 数据中实时检索记录,而无需先缓冲所有数据。如果您(也许是明智的)不想编写自己的具体化代码,可以使用一些工具来为您执行此操作 - 例如,LINQ-to-SQL 的
ExecuteQuery(tsql, args)
将无痛地完成上述操作。It depends in part on whether the query itself is streaming, or whether it does lots of work in temporary tables then (finally) starts returning data. You can't do much in the second scenario except re-write the query; however, in the first case an iterator block would usually help, i.e.
This is now a streaming iterator - you can
foreach
over it and it will retrieve records live from the incoming TDS data without buffering all the data first.If you (perhaps wisely) don't want to write your own materialization code, there are tools that will do this for you - for example, LINQ-to-SQL's
ExecuteQuery<T>(tsql, args)
will do the above pain-free.您需要使用数据分页。
SQL Server 有 TOP 子句(SQL TOP 10 a,b,c from d)和 BETWEEN:
有了这个,我猜你将能够检索 N 个数字行,进行一些部分处理,然后加载接下来的 N 行,依此类推。
这可以通过实现多线程解决方案来实现:一个将检索结果,而另一个将异步等待数据并进行一些处理。
You'd need to use data paging.
SQL Server has the TOP clause (SQL TOP 10 a,b,c from d) and BETWEEN:
Having this, I guess you'd be able of retrieving an N number of rows, do some partial processing, then load next N number of rows and so on.
This can be achieved by implementing a multithreaded solution: one will be retrieving results while the other will asynchronously wait for data and it'll be doing some processing.
如果你真的必须处理数百万条记录,为什么不每轮加载 10,000 条记录,然后加载下 10,000 条记录呢?如果不考虑在加载数据之前使用 DBMS 来过滤数据,因为数据库上的性能比逻辑层中的性能要好得多。
或者遵循延迟加载概念,仅加载 Id,仅在需要时才加载实际数据。
if you really have to process millions of records Why dont you load 10,000 each round process them and then load the next 10,000? if not consider using the DBMS to filter the data before loading it as the performance on the database is much better than in you logic leyer.
Or follow a lazy load concept and load only Ids to which you load the actual data only when you need it.