我正在开发一个程序,可以对大文件(直到 64 GB)进行大量读/写随机访问。文件是专门结构化的,为了访问它们,我创建了一个框架;一段时间后,我尝试测试它的性能,我注意到预分配的文件顺序写入操作太慢而无法接受。
经过多次测试后,我在没有框架的情况下复制了该行为(仅 FileStream 方法);这是(使用我的硬件)复制问题的代码部分:
FileStream fs = new FileStream("test1.vhd", FileMode.Open);
byte[] buffer = new byte[256 * 1024];
Random rand = new Random();
rand.NextBytes(buffer);
DateTime start, end;
double ellapsed = 0.0;
long startPos, endPos;
BinaryReader br = new BinaryReader(fs);
br.ReadUInt32();
br.ReadUInt32();
for (int i = 0; i < 65536; i++)
br.ReadUInt16();
br = null;
startPos = 0; // 0
endPos = 4294967296; // 4GB
for (long index = startPos; index < endPos; index += buffer.Length)
{
start = DateTime.Now;
fs.Write(buffer, 0, buffer.Length);
end = DateTime.Now;
ellapsed += (end - start).TotalMilliseconds;
}
不幸的是,这个问题似乎是不可预测的,所以有时它“有效”,有时则不然。
但是,使用 Process Monitor 我捕获了以下事件:
Operation Result Detail
WriteFile SUCCESS Offset: 1.905.655.816, Length: 262.144
WriteFile SUCCESS Offset: 1.905.917.960, Length: 262.144
WriteFile SUCCESS Offset: 1.906.180.104, Length: 262.144
WriteFile SUCCESS Offset: 1.906.442.248, Length: 262.144
WriteFile SUCCESS Offset: 1.906.704.392, Length: 262.144
WriteFile SUCCESS Offset: 1.906.966.536, Length: 262.144
ReadFile SUCCESS Offset: 1.907.228.672, Length: 32.768, I/O Flags: Non-cached, Paging I/O, Synchronous Paging I/O, Priority: Normal
WriteFile SUCCESS Offset: 1.907.228.680, Length: 262.144
ReadFile SUCCESS Offset: 1.907.355.648, Length: 32.768, I/O Flags: Non-cached, Paging I/O, Synchronous Paging I/O, Priority: Normal
ReadFile SUCCESS Offset: 1.907.490.816, Length: 32.768, I/O Flags: Non-cached, Paging I/O, Synchronous Paging I/O, Priority: Normal
WriteFile SUCCESS Offset: 1.907.490.824, Length: 262.144
ReadFile SUCCESS Offset: 1.907.617.792, Length: 32.768, I/O Flags: Non-cached, Paging I/O, Synchronous Paging I/O, Priority: Normal
ReadFile SUCCESS Offset: 1.907.752.960, Length: 32.768, I/O Flags: Non-cached, Paging I/O, Synchronous Paging I/O, Priority: Normal
WriteFile SUCCESS Offset: 1.907.752.968, Length: 262.144
也就是说,在覆盖近 2 GB 后,FileStream.Write
在每个 之后开始调用 ReadFile
WriteFile
,这个问题一直持续到进程结束;此外,问题开始的偏移量似乎是随机的。
我已在 FileStream.Write
方法内逐步调试,并且已验证实际上是在内部调用 WriteFile
(Win32 API) >读取文件。
最后一点;我不认为这是文件碎片问题:我已经用 contig 亲自对文件进行了碎片整理!
I'm working on a program that does heavy read/write random access on huge file (till 64 GB). Files are specifically structured and to make access on them I've created a framework; after a while I tried to test performance on it and I've noticed that on preallocated file sequential write operations are too slow to be acceptable.
After many tests I replicated the behavior without my framework (only FileStream methods); here's the portion of code that (with my hardware) replicates the issue:
FileStream fs = new FileStream("test1.vhd", FileMode.Open);
byte[] buffer = new byte[256 * 1024];
Random rand = new Random();
rand.NextBytes(buffer);
DateTime start, end;
double ellapsed = 0.0;
long startPos, endPos;
BinaryReader br = new BinaryReader(fs);
br.ReadUInt32();
br.ReadUInt32();
for (int i = 0; i < 65536; i++)
br.ReadUInt16();
br = null;
startPos = 0; // 0
endPos = 4294967296; // 4GB
for (long index = startPos; index < endPos; index += buffer.Length)
{
start = DateTime.Now;
fs.Write(buffer, 0, buffer.Length);
end = DateTime.Now;
ellapsed += (end - start).TotalMilliseconds;
}
Unfortunately the issue seems to be unpredictable, so sometimes it "works", sometimes it doesn't.
However, using Process Monitor I've caught the following events:
Operation Result Detail
WriteFile SUCCESS Offset: 1.905.655.816, Length: 262.144
WriteFile SUCCESS Offset: 1.905.917.960, Length: 262.144
WriteFile SUCCESS Offset: 1.906.180.104, Length: 262.144
WriteFile SUCCESS Offset: 1.906.442.248, Length: 262.144
WriteFile SUCCESS Offset: 1.906.704.392, Length: 262.144
WriteFile SUCCESS Offset: 1.906.966.536, Length: 262.144
ReadFile SUCCESS Offset: 1.907.228.672, Length: 32.768, I/O Flags: Non-cached, Paging I/O, Synchronous Paging I/O, Priority: Normal
WriteFile SUCCESS Offset: 1.907.228.680, Length: 262.144
ReadFile SUCCESS Offset: 1.907.355.648, Length: 32.768, I/O Flags: Non-cached, Paging I/O, Synchronous Paging I/O, Priority: Normal
ReadFile SUCCESS Offset: 1.907.490.816, Length: 32.768, I/O Flags: Non-cached, Paging I/O, Synchronous Paging I/O, Priority: Normal
WriteFile SUCCESS Offset: 1.907.490.824, Length: 262.144
ReadFile SUCCESS Offset: 1.907.617.792, Length: 32.768, I/O Flags: Non-cached, Paging I/O, Synchronous Paging I/O, Priority: Normal
ReadFile SUCCESS Offset: 1.907.752.960, Length: 32.768, I/O Flags: Non-cached, Paging I/O, Synchronous Paging I/O, Priority: Normal
WriteFile SUCCESS Offset: 1.907.752.968, Length: 262.144
That is, after over-writing almost 2 GB, FileStream.Write
starts to call ReadFile
after every WriteFile
, and this issue continue till the end of the process; also, the offset at which the issue begins seems to be random.
I've debugged step-by-step inside the FileStream.Write
method and I've verified that actually is the WriteFile
(Win32 API) that, internally, calls ReadFile
.
Last note; I don't think it is a file fragmentation issue: I've defragmented the file personally with contig!
发布评论
评论(2)
我相信这与 FileStream.Write / Read 和 2GB 限制有关。您是否在 32 位进程中运行此程序?我找不到任何具体的文档,但这里有一个 MSDN 论坛 听起来相同的问题。您可以尝试在 64 位进程中运行它。
不过,我同意使用内存映射文件可能是更好的方法。
I believe this has to do with FileStream.Write / Read and a 2GB limit. Are you running this in a 32 bit process? I could not find any specific documentation on this, but here is a MSDN forum question that sounds the same. You could try running this in a 64bit process.
I agree however that using a memory mapped file may be a better approach.
我从 MSDN 找到了这个。可能有关系吗?在我看来,每个文件都有一个全局共享指针。
当 FileStream 对象对其句柄没有独占保留时,另一个线程可以同时访问该文件句柄并更改与该文件句柄关联的操作系统文件指针的位置。在这种情况下,FileStream 对象中的缓存位置和缓冲区中的缓存数据可能会受到损害。 FileStream 对象定期对访问缓存缓冲区的方法执行检查,以确保操作系统的句柄位置与 FileStream 对象使用的缓存位置相同。
http://msdn.microsoft.com/en-us/library /system.io.filestream.aspx
I found this from MSDN. Could it be related? Sounds to me each file has one globally shared pointer.
When a FileStream object does not have an exclusive hold on its handle, another thread could access the file handle concurrently and change the position of the operating system's file pointer that is associated with the file handle. In this case, the cached position in the FileStream object and the cached data in the buffer could be compromised. The FileStream object routinely performs checks on methods that access the cached buffer to assure that the operating system's handle position is the same as the cached position used by the FileStream object.
http://msdn.microsoft.com/en-us/library/system.io.filestream.aspx