Oracle 查询输出在假脱机到文件时不完整
最近,我不得不增加数据提取脚本的数量,并且我发现输出丢失了源表中存在的一些数据。任何人都可以提出一些可能导致此问题的想法吗?
背景
提取过程是一个 sql 脚本,我从 sqlplus 客户端(在 Windows Server 2008 中运行)执行该脚本,并将查询结果假脱机到输出文件。
提取的数据包括客户记录及其交易记录。每个客户每个月可能有数百笔交易记录。在增加负载之前,我们通过过去 3 个月的交易吸引了大约 10 万客户。现在这个数字已经增加到100万。我预计输出文件约为 30GB。
交易表分为每月分区。提取脚本包括一个单独的查询假脱机每个分区。看起来3个月的交易中的第一个已经输出,而第2和第3个月的交易丢失了(尽管进程日志显示它仍然花费了时间查询所有分区)。
一些有趣的进展
这个 sql 脚本是在 perl 脚本中调用的。本来sql脚本是一次性调用的。我决定拆分 sql 并依次调用每个查询,输出的大小似乎增加了两倍(大约 30GB)。为什么会发生这种情况? (这可能有助于解决下面的实际问题。)
当前问题
第二个数据库包含 5 倍以上的客户和 2 种类型的事务表,每个表都划分为每日分区。因此,为了提取 3 个月的事务,该脚本对 2 个单独表的 90 个分区进行查询。同样,第1个月的数据似乎输出成功,但第2个月和第3个月的数据缺失。上述解决方案不适用于该数据库。
其他不寻常的事情是,从 2 个事务表输出的 2 个文件的大小几乎都是 4GB(4,294,967,362 字节)。难道是 Oracle 或其他什么东西限制了提取大小?
Recently I've had to increase the volume of an data extraction script and I've found the output to be missing some data that is present in the source tables. Can anyone suggest some ideas for what might be causing this issue?
Background
The extraction procedure is an sql script which I execute from an sqlplus client (running in Windows Server 2008) and which spools the query results to output files.
The data that is extracted includes customer records and their transaction records. Each customer may have several hundred transaction records each month. Before the increased load, we were extracting about 100K customers with the last 3 months of transactions. This has now increased to 1 million. I expect that the output file would be about 30GB.
The transactions table is partitioned into monthly partitions. The extraction script includes a separate query spooling each partition. It seemed that the first of the 3 month's transactions was output, and the 2nd and 3rd month of transactions was missing (though the process log showed that it still spent the time querying all partitions).
Some interesting progress
This sql script is invoked within a perl script. Originally the sql script was invoked in one go. I decided to split up the sql and to invoke each query in turns and the output seems to have tripled in size (to about 30GB). Why might this have happened? (This may help with the real problem below.)
The current issue
There is a second database with 5x more customers and 2 types of transaction tables, each partitioned into daily partitions. So to extract 3 months of transactions, the script queries on 90 partitions on 2 separate tables. Again, the 1st month of data seems to have output successfully, but the 2nd and 3rd months are missing. The above solution did not work for this database.
Something else that is unusual is that the 2 files output from the 2 transaction tables are both almost exactly 4GB in size (4,294,967,362 bytes). Could it be that Oracle or something else is limiting the extraction size?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
Windows 版本是什么以及磁盘上的文件系统是什么?从内存来看,FAT32 文件系统有文件大小限制。认为您需要 NTFS 来支持 2G 以上(或者可能是 4G)。我怀疑你已经达到了这个限制,无论是在 Windows 中还是在 sqlplus 本身中。
来缩小范围
您可以尝试通过尝试而不是使用 spool 命令 。如果有效的话,看起来像是 sqlplus 问题。如果没有,则更有可能是文件系统问题。
What version of windows and what is the filesystem on your disk? From memory the FAT32 filesystem has a filesize limit. Think you need NTFS for over 2G (or maybe thats 4G). I suspect that you are hitting that limit, either in windows or sqlplus itself.
You could try narrowing it down by trying
as opposed to using the
spool
command. If that works it looks like a sqlplus issue. If it doesn't then more likely a filesystem issue.