在 C 中使用 64 位地址进行文件操作明GW32
我试图用 C 语言读取 24 GB XML 文件,但它不起作用。当我读入它时,我使用 ftell() 打印出当前位置,但是一旦它达到足够大的数字,它就会返回到一个较小的数字并重新开始,甚至从未达到文件的 20%。我认为这是用于存储位置(长)的变量范围的问题,根据http://msdn.microsoft.com/en-us/library/s3f49ktz(VS.80).aspx,而我的文件大小为 25,000,000,000 字节。很长的应该可以工作,但是我如何改变我的编译器(Cygwin/mingw32) 使用或让它拥有 fopen64?
I'm trying to read in a 24 GB XML file in C, but it won't work. I'm printing out the current position using ftell() as I read it in, but once it gets to a big enough number, it goes back to a small number and starts over, never even getting 20% through the file. I assume this is a problem with the range of the variable that's used to store the position (long), which can go up to about 4,000,000,000 according to http://msdn.microsoft.com/en-us/library/s3f49ktz(VS.80).aspx, while my file is 25,000,000,000 bytes in size. A long long should work, but how would I change what my compiler(Cygwin/mingw32) uses or get it to have fopen64?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
ftell()
函数通常返回一个unsigned long
,在 32 位系统上最多只能达到 232 字节 (4 GB)。因此,您无法获取 24 GB 文件的文件偏移量以适应 32 位long
。您可能有可用的
ftell64()
函数,或者标准fgetpos()
函数可能会向您返回更大的偏移量。The
ftell()
function typically returns anunsigned long
, which only goes up to 232 bytes (4 GB) on 32-bit systems. So you can't get the file offset for a 24 GB file to fit into a 32-bitlong
.You may have the
ftell64()
function available, or the standardfgetpos()
function may return a larger offset to you.您可以尝试使用操作系统提供的文件函数 CreateFile 和 ReadFile< /a>.根据文件指针主题,位置存储为 64 位值。
You might try using the OS provided file functions CreateFile and ReadFile. According to the File Pointers topic, the position is stored as a 64bit value.
除非您可以按照 Loadmaster 的建议使用 64 位方法,否则我认为您将不得不分解该文件。
这个Resource 似乎表明可以使用 _telli64()。但我无法测试这个,因为我不使用 mingw。
Unless you can use a 64-bit method as suggested by Loadmaster, I think you will have to break the file up.
This resource seems to suggest it is possible using _telli64(). I can't test this though, as I don't use mingw.
我不知道有什么方法可以在一个文件中执行此操作,有点破解,但如果正确拆分文件不是一个真正的选择,您可以编写一些临时拆分文件的函数,其中一个使用 ftell () 在文件中移动,并在到达分割点时将 ftell() 交换到新文件,然后在退出之前将文件重新拼接在一起。这绝对是一个糟糕的方法,但如果没有更好的解决方案出现,这可能是完成工作的一种方法。
I don't know of any way to do this in one file, a bit of a hack but if splitting the file up properly isn't a real option, you could write a few functions that temp split the file, one that uses ftell() to move through the file and swaps ftell() to a new file when its reaching the split point, then another that stitches the files back together before exiting. An absolutely botched up approach, but if no better solution comes to light it could be a way to get the job done.
我找到了答案。我不使用 fopen、fseek、fread、fwrite...,而是使用 _open、lseeki64、读、写。我能够在>中写作和寻找4GB 文件。
编辑:看起来后一个函数比前一个函数慢大约 6 倍。任何能解释这一点的人我都会悬赏。
编辑:哦,我在这里了解到 read() 和朋友是无缓冲的。 read() 和 fread() 有什么区别?
I found the answer. Instead of using fopen, fseek, fread, fwrite... I'm using _open, lseeki64, read, write. And I am able to write and seek in > 4GB files.
Edit: It seems the latter functions are about 6x slower than the former ones. I'll give the bounty anyone who can explain that.
Edit: Oh, I learned here that read() and friends are unbuffered. What is the difference between read() and fread()?
即使 Microsoft C 库中的 ftell() 返回 32 位值,因此一旦达到 2 GB 显然会返回虚假值,仅读取文件应该仍然可以正常工作。或者您也需要在文件中查找吗?为此,您需要 _ftelli64() 和 _fseeki64()。
请注意,与某些 Unix 系统不同,打开文件时不需要任何特殊标志来指示它处于某种“64 位模式”。底层 Win32 API 可以很好地处理大文件。
Even if the ftell() in the Microsoft C library returns a 32-bit value and thus obviously will return bogus values once you reach 2 GB, just reading the file should still work fine. Or do you need to seek around in the file, too? For that you need _ftelli64() and _fseeki64().
Note that unlike some Unix systems, you don't need any special flag when opening the file to indicate that it is in some "64-bit mode". The underlying Win32 API handles large files just fine.