如何尽可能高效地处理大量并发磁盘写入请求
假设以下方法被 .net 4 应用程序中的不同线程调用数千次。处理这种情况的最佳方法是什么?了解磁盘是这里的瓶颈,但我希望 WriteFile() 方法能够快速返回。
数据可达几MB。我们是在谈论线程池、TPL 之类的吗?
public void WriteFile(string FileName, MemoryStream Data)
{
try
{
using (FileStream DiskFile = File.OpenWrite(FileName))
{
Data.WriteTo(DiskFile);
DiskFile.Flush();
DiskFile.Close();
}
}
catch (Exception e)
{
Console.WriteLine(e.Message);
}
}
Say the method below is being called several thousand times by different threads in a .net 4 application. What’s the best way to handle this situation? Understand that the disk is the bottleneck here but I’d like the WriteFile() method to return quickly.
Data can be can be up to a few MB. Are we talking threadpool, TPL or the like?
public void WriteFile(string FileName, MemoryStream Data)
{
try
{
using (FileStream DiskFile = File.OpenWrite(FileName))
{
Data.WriteTo(DiskFile);
DiskFile.Flush();
DiskFile.Close();
}
}
catch (Exception e)
{
Console.WriteLine(e.Message);
}
}
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
如果您想快速返回并且并不真正关心操作是否同步,您可以在内存中创建某种
Queue
,您将在其中放置写请求,当队列未填满时,您可以从方法返回迅速地。另一个线程将负责调度队列和写入文件。如果调用WriteFile
并且队列已满,您将必须等待,直到可以排队并且执行将再次同步,但这样您就可以拥有一个大缓冲区,因此如果处理文件写入请求不是线性的,但是更加尖峰(写入文件调用尖峰之间有暂停),这种变化可以被视为性能的改进。更新:
给你做了一张小图。请注意,瓶颈始终存在,您所能做的就是使用队列来优化请求。请注意,队列有限制,因此当其填满时,您无法将队列文件插入其中,您必须等待,以便该缓冲区中也有可用空间。但对于图片中显示的情况(3 个存储桶请求),很明显您可以快速将存储桶放入队列并返回,而在第一种情况下,您必须一个一个地执行该操作并阻止执行。
请注意,您永远不需要同时执行许多 IO 线程,因为它们都将使用相同的瓶颈,如果您尝试如此大量并行,您只会浪费内存,我相信 2 - 10 个线程顶部将轻松占用所有可用 IO 带宽,并且也会限制应用程序内存使用。
If you want to return quickly and not really care that operation is synchronous you could create some kind of in memory
Queue
where you will be putting write requests , and while Queue is not filled up you can return from method quickly. Another thread will be responsible for dispatchingQueue
and writing files. If yourWriteFile
is called and queue is full you will have to wait until you can queue and execution will become synchronous again, but that way you could have a big buffer so if process file write requests is not linear , but is more spiky instead (with pauses between write file calls spikes) such change can be seen as an improvement in your performance.UPDATE:
Made a little picture for you. Notice that bottleneck always exists, all you can possibly do is optimize requests by using a queue. Notice that queue has limits, so when its filled up , you cannot insta queue files into, you have to wait so there is a free space in that buffer too. But for situation presented on picture (3 bucket requests) its obvious you can quickly put buckets into queue and return, while in first case you have to do that 1 by one and block execution.
Notice that you never need to execute many IO threads at once, since they will all be using same bottleneck and you will just be wasting memory if you try to parallel this heavily, I believe 2 - 10 threads tops will take all available IO bandwidth easily, and will limit application memory usage too.
既然你说文件不需要按顺序或立即写入,最简单的方法是使用
Task
:TPL 在内部使用线程池,即使对于大量的任务。
Since you say that the files don't need to be written in order nor immediately, the simplest approach would be to use a
Task
:The TPL uses the thread pool internally, and should be fairly efficient even for large numbers of tasks.
如果数据传入的速度快于您记录的速度,那么您就有了真正的问题。生产者/消费者设计,其中
WriteFile
只是将内容扔到ConcurrentQueue
或类似的结构中,并且一个单独的线程为该队列提供服务,效果很好......直到队列填满。如果您要打开 50,000 个不同的文件,备份速度会很快。更不用说每个文件可能有几兆字节的数据将进一步限制队列的大小。我遇到过类似的问题,我通过将 WriteFile 方法附加到单个文件来解决。它写入的记录有记录号、文件名、长度,然后是数据。正如汉斯在对您最初问题的评论中指出的那样,写入文件很快; 打开文件速度很慢。
我的程序中的第二个线程开始读取
WriteFile
正在写入的文件。该线程读取每个记录头(编号、文件名、长度),打开一个新文件,然后将数据从日志文件复制到最终文件。如果日志文件和最终文件位于不同的磁盘上,则效果更好,但它仍然可以在单个主轴上正常工作。不过,它确实会锻炼你的硬盘。
它的缺点是需要 2 倍的磁盘空间,但对于 150 美元以下的 2 TB 驱动器,我不认为有什么大问题。总体而言,它的效率也低于直接写入数据(因为您必须处理数据两次),但它的好处是不会导致主处理线程停止。
If data is coming in faster than you can log it, you have a real problem. A producer/consumer design that has
WriteFile
just throwing stuff into aConcurrentQueue
or similar structure, and a separate thread servicing that queue works great ... until the queue fills up. And if you're talking about opening 50,000 different files, things are going to back up quick. Not to mention that your data that can be several megabytes for each file is going to further limit the size of your queue.I've had a similar problem that I solved by having the
WriteFile
method append to a single file. The records it wrote had a record number, file name, length, and then the data. As Hans pointed out in a comment to your original question, writing to a file is quick; opening a file is slow.A second thread in my program starts reading that file that
WriteFile
is writing to. That thread reads each record header (number, filename, length), opens a new file, and then copies data from the log file to the final file.This works better if the log file and the final file are are on different disks, but it can still work well with a single spindle. It sure exercises your hard drive, though.
It has the drawback of requiring 2X the disk space, but with 2-terabyte drives under $150, I don't consider that much of a problem. It's also less efficient overall than directly writing the data (because you have to handle the data twice), but it has the benefit of not causing the main processing thread to stall.
将完整的方法实现封装在新的
Thread()
中。然后您可以“即发即忘”这些线程并返回到主调用线程。Encapsulate your complete method implementation in a new
Thread()
. Then you can "fire-and-forget" these threads and return to the main calling thread.