在查询结束之前开始查看查询结果
假设我查询一个包含 500K 行的表。我想开始查看获取缓冲区中保存结果集的任何行,即使查询尚未完成。我想滚动浏览提取缓冲区。如果我滚动得太远,我想显示一条消息,例如:“到达获取缓冲区的最后一行..查询尚未完成”。
- 当查询继续构建结果集时,可以使用 fgets() 读取提取缓冲区来完成此操作吗?这样做意味着多线程*
除了 FIRST ROWS 提示指令之外,Oracle、Informix、MySQL 或其他 RDBMS 中是否可以提供这样的功能?
整个想法是能够在长查询完成之前开始查看行,同时显示可立即查看的行数的计数器。
编辑:我的建议可能需要对数据库服务器的体系结构进行根本性的改变,即它们处理内部获取缓冲区的方式,例如锁定结果集直到查询完成等。像我这样的功能建议非常有用,特别是对于需要很长时间才能完成的查询。为什么必须等到整个查询完成,此时您可以开始查看一些结果,同时查询继续收集更多结果!
Lets say I query a table with 500K rows. I would like to begin viewing any rows in the fetch buffer, which holds the result set, even though the query has not yet completed. I would like to scroll thru the fetch buffer. If I scroll too far ahead, I want to display a message like: "REACHED LAST ROW IN FETCH BUFFER.. QUERY HAS NOT YET COMPLETED".
- Could this be accomplished using fgets() to read the fetch buffer while the query continues building the result set? Doing this implies multi-threading*
Can a feature like this, other than the FIRST ROWS hint directive, be provided in Oracle, Informix, MySQL, or other RDBMS?
The whole idea is to have the ability to start viewing rows before a long query completes, while displaying a counter of how many rows are available for immediate viewing.
EDIT: What I'm suggesting may require a fundamental change in a DB server's architecture, as to the way they handle their internal fetch buffers, e.g. locking up the result set until the query has completed, etc. A feature like the one I am suggesting would be very useful, especially for queries which take a long time to complete. Why have to wait until the whole query completes, when you could start viewing some of the results while the query continues to gather more results!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
释义:
看来您想要的是某种可以有两个(或更多)线程在工作的系统。一个线程将忙于从数据库同步获取数据,并向程序的其余部分报告其进度。另一个线程将处理显示。
目前尚不清楚您的查询是否会返回 500,000 行(事实上,我们希望它不会),尽管它可能必须扫描所有 500,000 行(并且到目前为止很可能只找到 23 行匹配)。确定要返回的行数很困难;确定要扫描的行数更容易;确定已扫描的行数非常困难。
因此,用户已滚动到第 23 行,但查询尚未完成。
这里有几个问题。 DBMS(对于大多数数据库来说都是如此,当然对于 IDS 也是如此)在处理一条语句时仍然与当前连接相关。获得有关查询进展情况的反馈很困难。您可以查看查询启动时返回的估计行数(SQLCA 结构中的信息),但这些值很可能是错误的。当您到达 23 行中的第 200 行时,您必须决定要做什么,否则您只能到达 5,697 行中的第 23 行。有总比没有好,但并不可靠。确定查询进展到什么程度是非常困难的。有些查询需要实际的排序操作,这意味着很难预测需要多长时间,因为在排序完成之前没有可用的数据(并且一旦排序完成,只有之间进行通信所花费的时间) DBMS 和应用程序来阻止数据的传输)。
Informix 4GL 有很多优点,但线程支持并不是其中之一。该语言在设计时并未考虑到线程安全,并且没有简单的方法可以将其改进到产品中。
我确实认为您所寻求的最容易由两个线程支持。在像 I4GL 程序这样的单线程程序中,在等待用户输入更多输入(例如“向下滚动下一个充满数据的页面”)时,没有一种简单的方法来获取行。
FIRST ROWS优化是对DBMS的一个提示;它可能会或可能不会给感知性能带来显着的好处。总体而言,这通常意味着从 DBMS 的角度来看,查询的处理效果较差,但快速向用户提供结果可能比 DBMS 上的工作负载更重要。
在下面某个被严重否决的答案中,弗兰克喊道(但请不要喊):
好的。这里的困难是在两个客户端进程之间组织 IPC。如果两者都连接到 DBMS,则它们具有单独的连接,因此一个会话的临时表和游标对另一个会话不可用。
并非所有查询都会生成临时表,尽管滚动游标的结果集通常确实具有与临时表大致相同的内容。 IDS 不需要在支持滚动游标的临时表上放置锁,因为只有 IDS 才能访问该表。如果它是常规临时表,则仍然不需要锁定它,因为除了创建它的会话之外,无法访问它。
也许更准确的状态消息是:
大概;您还可以使用“SELECT COUNT(*) FROM TheTable”获得快速准确的计数;这不会扫描任何内容,而只是访问控制数据 - 可能实际上与 SMI 表 sysmaster:sysactptnhdr 的 nrows 列中的数据相同。
因此,催生新流程显然并不是成功的秘诀。您必须将查询结果从生成的进程传输到原始进程。正如我所说,具有单独显示和数据库访问线程的多线程解决方案可以以某种方式工作,但使用 I4GL 执行此操作存在问题,因为它不支持线程。您仍然需要决定客户端代码如何存储用于显示的信息。
Paraphrasing:
It seems that what you would like is some sort of system where there can be two (or more) threads at work. One thread would be busy synchronously fetching the data from the database, and reporting its progress to the rest of the program. The other thread would be dealing with the display.
It isn't clear that your query will return 500,000 rows (indeed, let us hope it does not), though it may have to scan all 500,000 rows (and may well have only found 23 rows that match so far). Determining the number of rows to be returned is hard; determining the number of rows to be scanned is easier; determining the number of rows already scanned is very difficult.
So, the user has scrolled past the 23rd row, but the query is not yet completed.
There are a couple of issues here. The DBMS (true of most databases, and certainly of IDS) remains tied up as far as the current connection on processing the one statement. Obtaining feedback on how a query has progressed is difficult. You could look at the estimated rows returned when the query was started (information in the SQLCA structure), but those values are apt to be wrong. You'd have to decide what to do when you reach row 200 of 23, or you only get to row 23 of 5,697. It is better than nothing, but it is not reliable. Determining how far a query has progressed is very difficult. And some queries require an actual sort operation, which means that it is very hard to predict how long it will take because no data is available until the sort is done (and once the sort is done, there is only the time taken to communicate between the DBMS and the application to hold up the delivery of the data).
Informix 4GL has many virtues, but thread support is not one of them. The language was not designed with thread safety in mind, and there is no easy way to retrofit it into the product.
I do think that what you are seeking would be most easily supported by two threads. In a single-threaded program like an I4GL program, there isn't an easy way to go off and fetch rows while waiting for the user to type some more input (such as 'scroll down the next page full of data').
The FIRST ROWS optimization is a hint to the DBMS; it may or may not give a significant benefit to the perceived performance. Overall, it typically means that the query is processed less optimally from the DBMS perspective, but getting results to the user quickly can be more important than the workload on the DBMS.
Somewhere down below in a much down-voted answer, Frank shouted (but please don't SHOUT):
OK. The difficulty here is organizing the IPC between the two client-side processes. If both are connected to the DBMS, they have separate connections, and therefore the temporary tables and cursors of one session are not available to the other.
Not all queries result in a temporary table, though the result set for a scroll cursor usually does have something approximately equivalent to a temporary table. IDS does not need to place a lock on the temporary table backing a scroll cursor because only IDS can access the table. If it was a regular temp table, there'd still not be a need to lock it because it cannot be accessed except by the session that created it.
Maybe a more accurate status message would be:
Probably; you can also get a fast and accurate count with 'SELECT COUNT(*) FROM TheTable'; this does not scan anything but simply accesses the control data - probably effectively the same data as in the nrows column of the SMI table sysmaster:sysactptnhdr.
So, spawning a new process is not clearly a recipe for success; you have to transfer the query results from the spawned process to the original process. As I stated, a multithreaded solution with separate display and database access threads would work after a fashion, but there are issues with doing this using I4GL because it is not thread-aware. You'd still have to decide how the client-side code is going store the information for display.
存在三个基本限制因素:
查询的执行计划。如果执行计划在末尾有阻塞操作(例如排序或急切假脱机),则引擎无法在查询执行的早期返回行。它必须等到所有行都被完全处理,之后它会尽快将数据返回给客户端。这样做的时间本身可能很长,因此这部分可能适用于您正在谈论的内容。但一般来说,您不能保证查询很快就会有很多可用的内容。
数据库连接库。从数据库返回记录集时,驱动程序可以使用服务器端分页或客户端分页。使用哪个可以并且确实会影响返回哪些行以及何时返回。客户端分页强制立即返回整个查询,从而减少了在数据全部输入之前显示任何数据的机会。仔细使用正确的分页方法对于在查询生命周期的早期显示数据的机会至关重要。
客户端程序对同步或异步方法的使用。如果您只是复制并粘贴一些用于执行查询的 Web 示例代码,那么在查询仍在运行时,您不太可能使用早期结果 - 相反,该方法将阻塞,并且在全部完成之前您将一无所获。当然,服务器端分页(请参见第 2 点)可以缓解这种情况,但无论如何,如果您不专门使用异步方法,您的应用程序将被阻塞至少很短的时间。对于正在使用 .Net 阅读本文的任何人,您可能需要查看 .Net Framework 中的异步操作。
如果你把所有这些都做好了,并使用“FAST FIRSTROW”技术,你也许能够做一些你正在寻找的事情。但没有任何保证。
There are three basic limiting factors:
The execution plan of the query. If the execution plan has a blocking operation at the end (such as a sort or an eager spool), the engine cannot return rows early in the query execution. It must wait until all rows are fully processed, after which it will return the data as fast as possible to the client. The time for this may itself be appreciable, so this part could be applicable to what you're talking about. In general, though, you cannot guarantee that a query will have much available very soon.
The database connection library. When returning recordsets from a database, the driver can use server-side paging or client-side paging. Which is used can and does affect which rows will be returned and when. Client-side paging forces the entire query to be returned at once, reducing the opportunity for displaying any data before it is all in. Careful use of the proper paging method is crucial to any chance to display data early in a query's lifetime.
The client program's use of synchronous or asynchronous methods. If you simply copy and paste some web example code for executing a query, you will be a bit less likely to be working with early results while the query is still running—instead the method will block and you will get nothing until it is all in. Of course, server-side paging (see point #2) can alleviate this, however in any case your application will be blocked for at least a short time if you do not specifically use an asynchronous method. For anyone reading this who is using .Net, you may want to check out Asynchronous Operations in .Net Framework.
If you get all of these right, and use the FAST FIRSTROW technique, you may be able to do some of what you're looking for. But there is no guarantee.
可以使用分析函数来完成,但是如果没有索引,无论您做什么,Oracle 都必须全面扫描表以确定计数。分析可以简化您的查询:
返回的每一行将在 the_count 中包含查询返回的总行数。然而,正如我所说,Oracle 必须在返回任何内容之前完成查询以确定计数。
根据您处理查询的方式(例如,表单中的 PL/SQL 块),您可以使用上述查询来打开游标,然后循环游标并显示记录集,并让用户有机会取消。
It can be done, with an analytic function, but Oracle has to full scan the table to determine the count no matter what you do if there's no index. An analytic could simplify your query:
Each row returned will have the total count of rows returned by the query in the_count. As I said, however, Oracle will have to finish the query to determine the count before anything is returned.
Depending on how you're processing the query (e.g., a PL/SQL block in a form), you could use the above query to open a cursor, then loop through the cursor and display sets of records and give the user the chance to cancel.
我不确定您将如何完成此操作,因为查询必须在知道结果之前完成。 (据我所知)没有任何 RDBMS 提供任何方法来确定在查询完成之前已找到多少查询结果。
我无法实事求是地说这样的功能在 Oracle 中会有多昂贵,因为我从未见过源代码。然而,从外到内,我认为这会相当昂贵,并且可能会使查询完成所需的时间加倍(如果不是更多)。这意味着在每个结果之后更新原子计数器,当您谈论数百万可能的行时,这并不便宜。
I'm not sure how you would accomplish this, since the query has to complete prior to the results being known. No RDBMS (that I know of) offers any means of determining how many results to a query have been found prior to the query completing.
I can't speak factually for how expensive such a feature would be in Oracle because I have never seen the source code. From the outside in, however, I think it would be rather costly and could double (if not more) the length of time a query took to complete. It would mean updating an atomic counter after each result, which isn't cheap when you're talking millions of possible rows.
所以我将我的评论放入这个答案中 -
就甲骨文而言。
Oracle 在系统全局区域 (SGA) 内为每个实例维护自己的缓冲区高速缓存。缓冲区高速缓存的命中率取决于大小,大多数情况下达到 90%,这意味着十分之九的命中不会到达磁盘。
考虑到上述情况,即使有一种“方式”(可以这么说)来访问您运行的查询的缓冲区缓存,结果也将在很大程度上取决于缓存大小因素。如果缓冲区高速缓存太小,则高速缓存命中率会很小,并且会导致更多的物理磁盘 I/O,这将导致缓冲区高速缓存在临时数据内容方面不可靠。如果缓冲区高速缓存太大,则部分缓冲区高速缓存将未得到充分利用,并且内存资源将被浪费,这会导致在尝试访问缓冲区高速缓存时进行过多不必要的处理以查看缓冲区高速缓存获取您想要的数据。
另外,根据缓存大小和 SGA 内存,将由 ODBC 驱动程序/优化器来确定何时以及使用多少内容(缓存缓冲或直接磁盘 I/O)。
就尝试访问“缓冲区缓存”以查找您正在寻找的“行”而言,可能有一种方法(或在不久的将来)可以做到这一点,但没有办法知道您正在寻找什么for ("The row") 究竟存在或不存在。
此外,大型表的全表扫描通常会导致物理磁盘读取和较低的缓冲区缓存命中率。您可以通过查询
v$filestat
来了解数据文件级别的全表扫描活动加入SYS.dba_data_files
。以下是您可以使用的查询和示例结果:由于整个考验高度基于多个参数和统计数据,因此您正在寻找的结果可能仍然是由这些因素驱动的概率。
So I am putting up my comments into this answer-
In terms of Oracle.
Oracle maintains its own buffer cache inside the system global area (SGA) for each instance. The hit ratio on the buffer cache depends on the sizing and reaches 90% most of the time, which means 9 out of 10 hits will be satisfied without reaching the disk.
Considering the above, even if there is a "way" (so to speak) to access the buffer chache for a query you run, the results would highly depend on the cache sizing factor. If a buffer cache is too small, the cache hit ratio will be small and more physical disk I/O will result, which will render the buffer cache un-reliable in terms of temp-data content. If a buffer cache is too big, then parts of the buffer cache will be under-utilized and memory resources will be wasted, which in terms would render too much un-necessary processing trying to access the buffer cache while in order to peek in it for the data you want.
Also depending on your cache sizing and SGA memory it would be upto the ODBC driver / optimizer to determine when and how much to use what (cache buffering or Direct Disk I/O).
In terms of trying to access the "buffer cache" to find "the row" you are looking for, there might be a way (or in near future) to do it, but there would be no way to know if what you are looking for ("The row") is there or not after all.
Also, full table scans of large tables usually result in physical disk reads and a lower buffer cache hit ratio.You can get an idea of full table scan activity at the data file level by querying
v$filestat
and joining toSYS.dba_data_files
. Following is a query you can use and sample results:Since this whole ordeal is highly based on multiple parameters and statistics, the results of what you are looking for may remain a probability driven off of those facotrs.