Win32 ReadFile() 拒绝一次读取超过 16Mb 的文件?
我在 VirtualBox 中的 Windows XP 上遇到了非常奇怪的问题。
ReadFile()
函数拒绝在单次调用中读取超过 16Mb 的数据。 它返回错误代码 87 (ERROR_INVALID_ARGUMENT
)。 看起来数据长度限制为 24 位。
这是示例代码,让我可以找出确切的限制。
#include <conio.h>
#include <stdio.h>
#include <fcntl.h>
#include <io.h>
#include <sys/stat.h>
int _tmain(int argc, _TCHAR* argv[])
{
int fd,len,readed;
char *buffer;
char *fname="Z:\\test.dat";
fd=_open(fname,_O_RDWR|_O_BINARY,_S_IREAD|_S_IWRITE);
if (fd==-1) {
printf("Error opening file : %s\n",strerror(errno));
getch();
return -1;
}
len=_lseek(fd,0,SEEK_END);
_lseek(fd,0,SEEK_SET);
if (!len) {
printf("File length is 0.\n");
getch();
return -2;
}
buffer=(char *)malloc(len);
if (!buffer) {
printf("Failed to allocate memory.\n");
getch();
return -3;
}
readed=0;
while (readed<len) {
len-=100;
readed=_read(fd,buffer,len);
if (len<=100) break;
}
if (readed!=len) {
printf("Failed to read file: result %d error %s\n",readed,strerror(errno));
getch();
return -4;
}
_close(fd);
printf("Success (%u).",len);
getch();
return 0;
}
文件 Z:\test.dat
长度为 21Mb。
结果是“成功 (16777200)。
”
我试图在 Google 中找到相同的问题,但没有成功:(
可能有人知道问题的原因是什么?
I've got very strange problem on my Windows XP in VirtualBox.
ReadFile()
function refuses to read more than 16Mb of data in single call.
It returns error code 87 (ERROR_INVALID_ARGUMENT
).
Looks like data length is limited to 24 bits.
Here is the example code allowed me to find out exact limit.
#include <conio.h>
#include <stdio.h>
#include <fcntl.h>
#include <io.h>
#include <sys/stat.h>
int _tmain(int argc, _TCHAR* argv[])
{
int fd,len,readed;
char *buffer;
char *fname="Z:\\test.dat";
fd=_open(fname,_O_RDWR|_O_BINARY,_S_IREAD|_S_IWRITE);
if (fd==-1) {
printf("Error opening file : %s\n",strerror(errno));
getch();
return -1;
}
len=_lseek(fd,0,SEEK_END);
_lseek(fd,0,SEEK_SET);
if (!len) {
printf("File length is 0.\n");
getch();
return -2;
}
buffer=(char *)malloc(len);
if (!buffer) {
printf("Failed to allocate memory.\n");
getch();
return -3;
}
readed=0;
while (readed<len) {
len-=100;
readed=_read(fd,buffer,len);
if (len<=100) break;
}
if (readed!=len) {
printf("Failed to read file: result %d error %s\n",readed,strerror(errno));
getch();
return -4;
}
_close(fd);
printf("Success (%u).",len);
getch();
return 0;
}
File Z:\test.dat
length is 21Mb.
Result is "Success (16777200).
"
I was trying to find same issues in Google without any success :(
May be someone knows what is the cause of the problem?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
问题不在于
ReadFile()
本身。真正的问题是您的while()
循环一开始就有问题。您对len
和readed
变量管理不善。在循环的每次迭代中,您都会递减len
并重置readed
。最终,len
减少到与readed
匹配的值,并且循环停止运行。您的“成功”消息报告 16MB 的事实纯属巧合,因为您在读取文件时正在修改这两个变量。len
最初设置为 21MB,并倒计时,直到_read()
在请求 16MB 时恰好返回 16MB 缓冲区。这并不意味着ReadFile()
在读取 16MB 数据时失败(如果是这种情况,则第一次循环迭代将会失败,因为它要求读取 21MB 数据)。您需要修复
while()
循环,而不是责怪ReadFile()
。正确的循环逻辑应该看起来更像这样:The problem is not with
ReadFile()
itself. The real problem is that yourwhile()
loop is buggy to begin with. You are mismanaging thelen
andreaded
variables. On each iteration of the loop, you decrementlen
and resetreaded
. Eventually,len
is decremented to a value that matchesreaded
and the loop stops running. The fact that your "Success" message reports 16MB is coincidence, because you are modifying both variables while you read the file.len
is initially set to 21MB and counts down until_read()
happens to return a 16MB buffer when 16MB was asked for. That does not mean thatReadFile()
failed on a 16MB read (if that were the case, the very first loop iteration would fail because it asks for a 21MB read).You need to fix your
while()
loop, not blameReadFile()
. The correct looping logic should look more like this instead:我建议您使用内存映射文件。 (另请参阅http://msdn.microsoft.com/en-us/library /aa366556.aspx)。下面的简单代码展示了一种实现此目的的方法:
经过一些简单的步骤后,您将得到一个代表整个文件内容的指针
pSrcFile
。这不是您所需要的吗?dwInFileSizeHigh
和dwInFileSizeLow
中存储的内存块总大小:((__int64)dwInFileSizeHigh << 32)+dwInFileSizeLow
。这使用了与用于实现交换文件(页面文件)的 Windows 内核相同的功能。它由磁盘缓存缓冲并且非常高效。如果计划主要按顺序访问文件,则在对 CreateFile() 的调用中包含标志 FILE_FLAG_SEQUENTIAL_SCAN 将向系统提示这一事实,使其尝试提前读取以获得更好的性能。
我看到您在测试示例中读到的文件的名称为“Z:\test.dat”。如果它是来自网络驱动器的文件,您将看到明显的性能优势。 Morover 对应于 http://msdn.microsoft.com/en-us/library /aa366542.aspx 您的限制约为 2 GB,而不是 16 Mb。我建议您将文件映射到 1 GB,然后仅创建一个关于 MapViewOfFile 的网络视图(我不确定您的代码是否需要处理这么大的文件)。更重要的是,在同一 MSDN 页面上,您可以阅读以下内容
所以内存映射文件的使用非常便宜。如果您的程序只读取文件内容的一部分,跳过文件的大部分内容,那么您还将获得很大的性能优势,因为它将只读取您真正访问的文件部分(四舍五入到 16K 页)。
以下是更干净的文件映射代码
I would recommend that you use Memory-Mapped Files. (see also http://msdn.microsoft.com/en-us/library/aa366556.aspx). The following simple code shows one way to do this:
After some simple steps you have a pointer
pSrcFile
which represent the whole file contents. Is this not what you need? The total size of the memory block in stored indwInFileSizeHigh
anddwInFileSizeLow
:((__int64)dwInFileSizeHigh << 32)+dwInFileSizeLow
.This uses the same feature of the Windows kernel that is used to implement the swap file (page file). It is buffered by the disk cache and very efficient. If plan to access the file mostly sequentially, including the flag FILE_FLAG_SEQUENTIAL_SCAN in the call to
CreateFile()
will hint this fact to the system, causing it to try to read ahead for even better performance.I see that file which you read in the test example has the name "Z:\test.dat". If it is a file coming from a network drive you will see a clear performance advantage. Morover corresponds with http://msdn.microsoft.com/en-us/library/aa366542.aspx you hav the limit about 2 GB instead of 16Mb. I recommend you to map files till 1 GB and then just create a net view with respect of
MapViewOfFile
(I am not sure that you code need work with so large files). More then that, on the same MSDN page you can read followingSo the usage of memory mapped files is really cheap. If your program reads only portions of the file contents skipping large parts of the file, then you will also have a large performance advantage because it will read only the parts of file which you really accessed (rounded to 16K pages).
More clean code for for file mapping is following
设备驱动程序返回少于请求的字节是完全合法的。这就是 ReadFile() 具有 lpNumberOfBytesRead 参数的原因。您应该避免低级 CRT 实现细节,例如 _read()。使用 fread() 代替。
更新:这不是正确的答案。看起来您的虚拟机只是拒绝考虑要求超过 16MB 的 ReadFile() 调用。可能与它用来与主机操作系统通信的内部缓冲区有关。除了在循环中调用 fread() 之外,您无能为力,这样您就可以保持在该上限以下。
It is entirely legal for a device driver to return less bytes than requested. That's why ReadFile() has the lpNumberOfBytesRead argument. You should avoid the low-level CRT implementation details, like _read(). Use fread() instead.
Update: this isn't the correct answer. It looks like your virtual machine simply refuses to consider ReadFile() calls that ask for more than 16MB. Probably has something to do with an internal buffer it uses to talk to the host operating system. Nothing you can do but call fread() in a loop so you can stay below this upper limit.
我假设您的示例中的 Z: 是一个共享文件夹。我刚刚偶然发现了同样的错误,并花了一些时间试图确定它。
看来,这个问题已经存在一段时间了: https://www.virtualbox.org/ticket/5830 。
I assume, Z: in your example is a shared folder. I just stumbled upon the same bug, and spent some time trying to pin it down.
It seems, the problem has been known for a while: https://www.virtualbox.org/ticket/5830.
我认为这是 Windows 的限制。根据我的经验,在 Windows XP 和 2003 x86 和 x64 上,
ReadFile()
无法读取超过 16 MB 的数据。在 Windows 2008 r2 和 Windows 8 x64 上,门槛要高得多 > 1 GB。我正在使用无缓冲 IO 作为备份实用程序。我从未使用过
MMF
,但使用FILE_FLAG_NO_BUFFERING
时 ReadFile 和 WriteFile 速度非常快。当读取速度为 147 MB/s 时,CPU 使用率*几乎为 0。英特尔i7
I think that this is a limitation in Windows. According to my experience, on Windows XP and 2003 x86 and x64,
ReadFile()
fails to read more than 16 MB. On Windows 2008 r2 and Windows 8 x64 the threshold is much higher > 1 GB. I am using unbuffered IO for a backup utility.I have never used
MMF
, but ReadFile and WriteFile are extremely fast withFILE_FLAG_NO_BUFFERING
. And the CPU usage* is almost 0 while reading at 147 MB/s.Intel i7