Win32 ReadFile() 拒绝一次读取超过 16Mb 的文件?

发布于 2024-09-16 10:44:23 字数 1449 浏览 2 评论 0原文

我在 VirtualBox 中的 Windows XP 上遇到了非常奇怪的问题。

ReadFile() 函数拒绝在单次调用中读取超过 16Mb 的数据。 它返回错误代码 87 (ERROR_INVALID_ARGUMENT)。 看起来数据长度限制为 24 位。

这是示例代码,让我可以找出确切的限制。

#include <conio.h>
#include <stdio.h>
#include <fcntl.h>
#include <io.h>
#include <sys/stat.h>

int _tmain(int argc, _TCHAR* argv[])
{
    int fd,len,readed;
    char *buffer;
    char *fname="Z:\\test.dat";
    fd=_open(fname,_O_RDWR|_O_BINARY,_S_IREAD|_S_IWRITE);
    if (fd==-1) {
        printf("Error opening file : %s\n",strerror(errno));
        getch();
        return -1;
    }
    len=_lseek(fd,0,SEEK_END);
    _lseek(fd,0,SEEK_SET);
    if (!len) {
        printf("File length is 0.\n");
        getch();
        return -2;
    }
    buffer=(char *)malloc(len);
    if (!buffer) {
        printf("Failed to allocate memory.\n");
        getch();
        return -3;
    }
    readed=0;
    while (readed<len) {
        len-=100;
        readed=_read(fd,buffer,len);
        if (len<=100) break;
    }
    if (readed!=len) {
        printf("Failed to read file: result %d error %s\n",readed,strerror(errno));
        getch();
        return -4;
    }
    _close(fd);
    printf("Success (%u).",len);
    getch();
    return 0;
}

文件 Z:\test.dat 长度为 21Mb。

结果是“成功 (16777200)。

我试图在 Google 中找到相同的问题,但没有成功:(

可能有人知道问题的原因是什么?

I've got very strange problem on my Windows XP in VirtualBox.

ReadFile() function refuses to read more than 16Mb of data in single call.
It returns error code 87 (ERROR_INVALID_ARGUMENT).
Looks like data length is limited to 24 bits.

Here is the example code allowed me to find out exact limit.

#include <conio.h>
#include <stdio.h>
#include <fcntl.h>
#include <io.h>
#include <sys/stat.h>

int _tmain(int argc, _TCHAR* argv[])
{
    int fd,len,readed;
    char *buffer;
    char *fname="Z:\\test.dat";
    fd=_open(fname,_O_RDWR|_O_BINARY,_S_IREAD|_S_IWRITE);
    if (fd==-1) {
        printf("Error opening file : %s\n",strerror(errno));
        getch();
        return -1;
    }
    len=_lseek(fd,0,SEEK_END);
    _lseek(fd,0,SEEK_SET);
    if (!len) {
        printf("File length is 0.\n");
        getch();
        return -2;
    }
    buffer=(char *)malloc(len);
    if (!buffer) {
        printf("Failed to allocate memory.\n");
        getch();
        return -3;
    }
    readed=0;
    while (readed<len) {
        len-=100;
        readed=_read(fd,buffer,len);
        if (len<=100) break;
    }
    if (readed!=len) {
        printf("Failed to read file: result %d error %s\n",readed,strerror(errno));
        getch();
        return -4;
    }
    _close(fd);
    printf("Success (%u).",len);
    getch();
    return 0;
}

File Z:\test.dat length is 21Mb.

Result is "Success (16777200)."

I was trying to find same issues in Google without any success :(

May be someone knows what is the cause of the problem?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

南七夏 2024-09-23 10:44:23

问题不在于 ReadFile() 本身。真正的问题是您的 while() 循环一开始就有问题。您对 lenreaded 变量管理不善。在循环的每次迭代中,您都会递减 len 并重置 readed。最终,len 减少到与 readed 匹配的值,并且循环停止运行。您的“成功”消息报告 16MB 的事实纯属巧合,因为您在读取文件时正在修改这两个变量。 len 最初设置为 21MB,并倒计时,直到 _read() 在请求 16MB 时恰好返回 16MB 缓冲区。这并不意味着 ReadFile() 在读取 16MB 数据时失败(如果是这种情况,则第一次循环迭代将会失败,因为它要求读取 21MB 数据)。

您需要修复 while() 循环,而不是责怪 ReadFile()。正确的循环逻辑应该看起来更像这样:

int total = 0; 

while (total < len)
{ 
    readed = _read(fd, &buffer[total], len-total); 
    if (readed < 1) break;
    total += readed;
} 

_close(fd); 

if (total != len)
{ 
    printf("Failed to read file: %d out of %d, error %s\n", total, len, strerror(errno)); 
    ...
    return -4; 
} 

printf("Success (%u).",total); 
...

The problem is not with ReadFile() itself. The real problem is that your while() loop is buggy to begin with. You are mismanaging the len and readed variables. On each iteration of the loop, you decrement len and reset readed. Eventually, len is decremented to a value that matches readed and the loop stops running. The fact that your "Success" message reports 16MB is coincidence, because you are modifying both variables while you read the file. len is initially set to 21MB and counts down until _read() happens to return a 16MB buffer when 16MB was asked for. That does not mean that ReadFile() failed on a 16MB read (if that were the case, the very first loop iteration would fail because it asks for a 21MB read).

You need to fix your while() loop, not blame ReadFile(). The correct looping logic should look more like this instead:

int total = 0; 

while (total < len)
{ 
    readed = _read(fd, &buffer[total], len-total); 
    if (readed < 1) break;
    total += readed;
} 

_close(fd); 

if (total != len)
{ 
    printf("Failed to read file: %d out of %d, error %s\n", total, len, strerror(errno)); 
    ...
    return -4; 
} 

printf("Success (%u).",total); 
...
盛装女皇 2024-09-23 10:44:23

我建议您使用内存映射文件。 (另请参阅http://msdn.microsoft.com/en-us/library /aa366556.aspx)。下面的简单代码展示了一种实现此目的的方法:

LPCTSTR pszSrcFilename = TEXT("Z:\\test.dat");
HANDLE hSrcFile = CreateFile (pszSrcFilename, GENERIC_READ, FILE_SHARE_READ,
                              NULL, OPEN_EXISTING,
                              FILE_ATTRIBUTE_NORMAL | FILE_FLAG_SEQUENTIAL_SCAN,
                              NULL);
HANDLE hMapSrcFile = CreateFileMapping (hSrcFile, NULL, PAGE_READONLY, 0, 0, NULL);
PBYTE pSrcFile = (PBYTE) MapViewOfFile (hMapSrcFile, FILE_MAP_READ, 0, 0, 0);
DWORD dwInFileSizeHigh, dwInFileSizeLow;
dwInFileSizeLow = GetFileSize (hInFile, &dwInFileSizeHigh);

经过一些简单的步骤后,您将得到一个代表整个文件内容的指针pSrcFile。这不是您所需要的吗? dwInFileSizeHighdwInFileSizeLow 中存储的内存块总大小:((__int64)dwInFileSizeHigh << 32)+dwInFileSizeLow

这使用了与用于实现交换文件(页面文件)的 Windows 内核相同的功能。它由磁盘缓存缓冲并且非常高效。如果计划主要按顺序访问文件,则在对 CreateFile() 的调用中包含标志 FILE_FLAG_SEQUENTIAL_SCAN 将向系统提示这一事实,使其尝试提前读取以获得更好的性能。

我看到您在测试示例中读到的文件的名称为“Z:\test.dat”。如果它是来自网络驱动器的文件,您将看到明显的性能优势。 Morover 对应于 http://msdn.microsoft.com/en-us/library /aa366542.aspx 您的限制约为 2 GB,而不是 16 Mb。我建议您将文件映射到 1 GB,然后仅创建一个关于 MapViewOfFile 的网络视图(我不确定您的代码是否需要处理这么大的文件)。更重要的是,在同一 MSDN 页面上,您可以阅读以下内容

文件映射对象的大小
您选择的控件控制进入的深度
您可以通过记忆“看到”的文件
映射。如果您创建文件映射
大小为 500 Kb 的对象,您
只能访问前 500 Kb
文件的大小,无论大小
文件。因为它不需要你花费
任何系统资源来创建
更大的文件映射对象,创建一个
文件映射对象的大小
文件的大小(设置dwMaximumSizeHigh
和 dwMaximumSizeLow 参数
CreateFileMapping 两者都为零)甚至
如果您不希望看到
整个文件。系统成本
资源来自于创建视图
并访问它们。

所以内存映射文件的使用非常便宜。如果您的程序只读取文件内容的一部分,跳过文件的大部分内容,那么您还将获得很大的性能优势,因为它将只读取您真正访问的文件部分(四舍五入到 16K 页)。

以下是更干净的文件映射代码

DWORD MapFileInMemory (LPCTSTR pszFileName,
                       PBYTE *ppbyFile,
                       PDWORD pdwFileSizeLow, OUT PDWORD pdwFileSizeHigh)
{
    HANDLE  hFile = INVALID_HANDLE_VALUE, hFileMapping = NULL;
    DWORD dwStatus = NO_ERROR;
    const DWORD dwSourceId = MSG_SOURCE_MAP_FILE_IN_MEMORY;

    __try {
        hFile = CreateFile (pszFileName, FILE_READ_DATA, 0, NULL, OPEN_EXISTING,
                            FILE_ATTRIBUTE_NORMAL | FILE_FLAG_SEQUENTIAL_SCAN,
                            NULL);
        if (hFile == INVALID_HANDLE_VALUE) {
            dwStatus = GetLastError();
            __leave;
        }

        *pdwFileSizeLow = GetFileSize (hFile, pdwFileSizeHigh);
        if (*pdwFileSizeLow == INVALID_FILE_SIZE){
            dwStatus = GetLastError();
            __leave;
        }

        hFileMapping = CreateFileMapping (hFile, NULL, PAGE_READONLY, 0, 0, NULL);
        if (!hFileMapping){
            dwStatus = GetLastError();
            __leave;
        }

        *ppbyFile = (PBYTE) MapViewOfFile (hFileMapping, FILE_MAP_READ, 0, 0, 0);
        if (*ppbyFile == NULL) {
            dwStatus = GetLastError();
            __leave;
        }
    }
    __finally {
        if (hFileMapping) CloseHandle (hFileMapping);
        if (hFile != INVALID_HANDLE_VALUE) CloseHandle (hFile);
    }

    return dwStatus;
}

BOOL UnmapFileFromMemory (LPCVOID lpBaseAddress)
{
    return UnmapViewOfFile (lpBaseAddress);
}

I would recommend that you use Memory-Mapped Files. (see also http://msdn.microsoft.com/en-us/library/aa366556.aspx). The following simple code shows one way to do this:

LPCTSTR pszSrcFilename = TEXT("Z:\\test.dat");
HANDLE hSrcFile = CreateFile (pszSrcFilename, GENERIC_READ, FILE_SHARE_READ,
                              NULL, OPEN_EXISTING,
                              FILE_ATTRIBUTE_NORMAL | FILE_FLAG_SEQUENTIAL_SCAN,
                              NULL);
HANDLE hMapSrcFile = CreateFileMapping (hSrcFile, NULL, PAGE_READONLY, 0, 0, NULL);
PBYTE pSrcFile = (PBYTE) MapViewOfFile (hMapSrcFile, FILE_MAP_READ, 0, 0, 0);
DWORD dwInFileSizeHigh, dwInFileSizeLow;
dwInFileSizeLow = GetFileSize (hInFile, &dwInFileSizeHigh);

After some simple steps you have a pointer pSrcFile which represent the whole file contents. Is this not what you need? The total size of the memory block in stored in dwInFileSizeHigh and dwInFileSizeLow: ((__int64)dwInFileSizeHigh << 32)+dwInFileSizeLow.

This uses the same feature of the Windows kernel that is used to implement the swap file (page file). It is buffered by the disk cache and very efficient. If plan to access the file mostly sequentially, including the flag FILE_FLAG_SEQUENTIAL_SCAN in the call to CreateFile() will hint this fact to the system, causing it to try to read ahead for even better performance.

I see that file which you read in the test example has the name "Z:\test.dat". If it is a file coming from a network drive you will see a clear performance advantage. Morover corresponds with http://msdn.microsoft.com/en-us/library/aa366542.aspx you hav the limit about 2 GB instead of 16Mb. I recommend you to map files till 1 GB and then just create a net view with respect of MapViewOfFile (I am not sure that you code need work with so large files). More then that, on the same MSDN page you can read following

The size of the file mapping object
that you select controls how far into
the file you can "see" with memory
mapping. If you create a file mapping
object that is 500 Kb in size, you
have access only to the first 500 Kb
of the file, regardless of the size of
the file. Since it does not cost you
any system resources to create a
larger file mapping object, create a
file mapping object that is the size
of the file (set the dwMaximumSizeHigh
and dwMaximumSizeLow parameters of
CreateFileMapping both to zero) even
if you do not expect to view the
entire file. The cost in system
resources comes in creating the views
and accessing them.

So the usage of memory mapped files is really cheap. If your program reads only portions of the file contents skipping large parts of the file, then you will also have a large performance advantage because it will read only the parts of file which you really accessed (rounded to 16K pages).

More clean code for for file mapping is following

DWORD MapFileInMemory (LPCTSTR pszFileName,
                       PBYTE *ppbyFile,
                       PDWORD pdwFileSizeLow, OUT PDWORD pdwFileSizeHigh)
{
    HANDLE  hFile = INVALID_HANDLE_VALUE, hFileMapping = NULL;
    DWORD dwStatus = NO_ERROR;
    const DWORD dwSourceId = MSG_SOURCE_MAP_FILE_IN_MEMORY;

    __try {
        hFile = CreateFile (pszFileName, FILE_READ_DATA, 0, NULL, OPEN_EXISTING,
                            FILE_ATTRIBUTE_NORMAL | FILE_FLAG_SEQUENTIAL_SCAN,
                            NULL);
        if (hFile == INVALID_HANDLE_VALUE) {
            dwStatus = GetLastError();
            __leave;
        }

        *pdwFileSizeLow = GetFileSize (hFile, pdwFileSizeHigh);
        if (*pdwFileSizeLow == INVALID_FILE_SIZE){
            dwStatus = GetLastError();
            __leave;
        }

        hFileMapping = CreateFileMapping (hFile, NULL, PAGE_READONLY, 0, 0, NULL);
        if (!hFileMapping){
            dwStatus = GetLastError();
            __leave;
        }

        *ppbyFile = (PBYTE) MapViewOfFile (hFileMapping, FILE_MAP_READ, 0, 0, 0);
        if (*ppbyFile == NULL) {
            dwStatus = GetLastError();
            __leave;
        }
    }
    __finally {
        if (hFileMapping) CloseHandle (hFileMapping);
        if (hFile != INVALID_HANDLE_VALUE) CloseHandle (hFile);
    }

    return dwStatus;
}

BOOL UnmapFileFromMemory (LPCVOID lpBaseAddress)
{
    return UnmapViewOfFile (lpBaseAddress);
}
往日情怀 2024-09-23 10:44:23

设备驱动程序返回少于请求的字节是完全合法的。这就是 ReadFile() 具有 lpNumberOfBytesRead 参数的原因。您应该避免低级 CRT 实现细节,例如 _read()。使用 fread() 代替。

更新:这不是正确的答案。看起来您的虚拟机只是拒绝考虑要求超过 16MB 的 ReadFile() 调用。可能与它用来与主机操作系统通信的内部缓冲区有关。除了在循环中调用 fread() 之外,您无能为力,这样您就可以保持在该上限以下。

It is entirely legal for a device driver to return less bytes than requested. That's why ReadFile() has the lpNumberOfBytesRead argument. You should avoid the low-level CRT implementation details, like _read(). Use fread() instead.

Update: this isn't the correct answer. It looks like your virtual machine simply refuses to consider ReadFile() calls that ask for more than 16MB. Probably has something to do with an internal buffer it uses to talk to the host operating system. Nothing you can do but call fread() in a loop so you can stay below this upper limit.

鹿童谣 2024-09-23 10:44:23

我假设您的示例中的 Z: 是一个共享文件夹。我刚刚偶然发现了同样的错误,并花了一些时间试图确定它。

看来,这个问题已经存在一段时间了: https://www.virtualbox.org/ticket/5830

I assume, Z: in your example is a shared folder. I just stumbled upon the same bug, and spent some time trying to pin it down.

It seems, the problem has been known for a while: https://www.virtualbox.org/ticket/5830.

情释 2024-09-23 10:44:23

我认为这是 Windows 的限制。根据我的经验,在 Windows XP 和 2003 x86 和 x64 上,ReadFile() 无法读取超过 16 MB 的数据。在 Windows 2008 r2 和 Windows 8 x64 上,门槛要高得多 > 1 GB。我正在使用无缓冲 IO 作为备份实用程序。

我从未使用过 MMF,但使用 FILE_FLAG_NO_BUFFERING 时 ReadFile 和 WriteFile 速度非常快。当读取速度为 147 MB​​/s 时,CPU 使用率*几乎为 0。

英特尔i7

I think that this is a limitation in Windows. According to my experience, on Windows XP and 2003 x86 and x64, ReadFile() fails to read more than 16 MB. On Windows 2008 r2 and Windows 8 x64 the threshold is much higher > 1 GB. I am using unbuffered IO for a backup utility.

I have never used MMF, but ReadFile and WriteFile are extremely fast with FILE_FLAG_NO_BUFFERING. And the CPU usage* is almost 0 while reading at 147 MB/s.

Intel i7

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文