通过多线程加速
我的程序中有一个解析方法,它首先从磁盘读取文件,然后解析行并为每一行创建一个对象。对于每个文件,随后都会保存包含行中对象的集合。文件大小约为 300MB。 这大约需要 2.5-3 分钟才能完成。
我的问题:如果我将任务拆分为一个线程(仅从磁盘读取文件),另一个线程解析行,第三个线程保存集合,我可以期待显着的加速吗?或者这可能会减慢这个过程?
现代笔记本硬盘读取300MB多长时间是常见的?我认为,瓶颈是我的任务中的 cpu,因为如果我执行该方法,cpu 的一个核心始终处于 100%,而磁盘空闲时间超过一半。
问候,雨
编辑:
private CANMessage parseLine(String line)
{
try
{
CANMessage canMsg = new CANMessage();
int offset = 0;
int offset_add = 0;
char[] delimiterChars = { ' ', '\t' };
string[] elements = line.Split(delimiterChars);
if (!isMessageLine(ref elements))
{
return canMsg = null;
}
offset = getPositionOfFirstWord(ref elements);
canMsg.TimeStamp = Double.Parse(elements[offset]);
offset += 3;
offset_add = getOffsetForShortId(ref elements, ref offset);
canMsg.ID = UInt16.Parse(elements[offset], System.Globalization.NumberStyles.HexNumber);
offset += 17; // for signs between identifier and data length number
canMsg.DataLength = Convert.ToInt16(elements[offset + offset_add]);
offset += 1;
parseDataBytes(ref elements, ref offset, ref offset_add, ref canMsg);
return canMsg;
}
catch (Exception exp)
{
MessageBox.Show(line);
MessageBox.Show(exp.Message + "\n\n" + exp.StackTrace);
return null;
}
}
}
这就是解析方法。它是这样工作的,但也许你是对的,而且效率很低。我有 .NET Framwork 4.0,使用的是 Windows 7。我有一个 Core i7,其中每个核心都有 HypterThreading,所以我只使用了大约 1/8 的 CPU。
EDIT2:我正在使用 Visual Studio 2010 Professional。看起来此版本中不提供性能分析工具(根据 msdn MSDN 性能分析初学者指南)。
EDIT3:我现在更改了代码以使用线程。现在看起来像这样:
foreach (string str in checkedListBoxImport.CheckedItems)
{
toImport.Add(str);
}
for(int i = 0; i < toImport.Count; i++)
{
String newString = new String(toImport.ElementAt(i).ToArray());
Thread t = new Thread(() => importOperation(newString));
t.Start();
}
虽然您在上面看到的解析是在 importOperation(...) 中调用的。
使用此代码,可以将时间从大约 2.5 分钟缩短到“仅”40 秒。我遇到了一些必须跟踪的并发问题,但至少这比以前快得多。
谢谢您的建议。
i have a parse method in my program, which first reads a file from disk then, parses the lines and creats an object for every line. For every file a collection with the objects from the lines is saved afterwards. The files are about 300MB.
This takes about 2.5-3 minutes to complete.
My question: Can i expect a significant speed up if i split the tasks up to one thread just reading files from disk, another parsing the lines and a third saving the collections? Or would this maybe slow down the process?
How long is it common for a modern notebook harddisk to read 300MB? I think, the bottleneck is the cpu in my task, because if i execute the method one core of cpu is always at 100% while the disk is idle more then the half time.
greetings, rain
EDIT:
private CANMessage parseLine(String line)
{
try
{
CANMessage canMsg = new CANMessage();
int offset = 0;
int offset_add = 0;
char[] delimiterChars = { ' ', '\t' };
string[] elements = line.Split(delimiterChars);
if (!isMessageLine(ref elements))
{
return canMsg = null;
}
offset = getPositionOfFirstWord(ref elements);
canMsg.TimeStamp = Double.Parse(elements[offset]);
offset += 3;
offset_add = getOffsetForShortId(ref elements, ref offset);
canMsg.ID = UInt16.Parse(elements[offset], System.Globalization.NumberStyles.HexNumber);
offset += 17; // for signs between identifier and data length number
canMsg.DataLength = Convert.ToInt16(elements[offset + offset_add]);
offset += 1;
parseDataBytes(ref elements, ref offset, ref offset_add, ref canMsg);
return canMsg;
}
catch (Exception exp)
{
MessageBox.Show(line);
MessageBox.Show(exp.Message + "\n\n" + exp.StackTrace);
return null;
}
}
}
So this is the parse method. It works this way, but maybe you are right and it is inefficient. I have .NET Framwork 4.0 and i am on Windows 7. I have a Core i7 where every core has HypterThreading, so i am only using about 1/8 of the cpu.
EDIT2: I am using Visual Studio 2010 Professional. It looks like the tools for a performance profiling are not available in this version (according to msdn MSDN Beginners Guide to Performance Profiling).
EDIT3: I changed the code now to use threads. It looks now like this:
foreach (string str in checkedListBoxImport.CheckedItems)
{
toImport.Add(str);
}
for(int i = 0; i < toImport.Count; i++)
{
String newString = new String(toImport.ElementAt(i).ToArray());
Thread t = new Thread(() => importOperation(newString));
t.Start();
}
While the parsing you saw above is called in the importOperation(...).
With this code it was possible to reduce the time from about 2.5 minutes to "only" 40 seconds. I got some concurrency problems i have to track but at least this is much faster then before.
Thank you for your advice.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
您不太可能获得一致的笔记本电脑硬盘性能指标,因为我们不知道您的笔记本电脑有多久了,也不知道它是已售出状态还是正在旋转。
考虑到您已经完成了一些基本的分析,我敢打赌 CPU 确实是您的瓶颈,因为单线程应用程序不可能使用超过 100% 的单个 CPU。这当然忽略了您的操作系统将进程分割到多个核心和其他奇怪的情况。如果 CPU 使用率为 5%,则很可能出现 IO 瓶颈。
也就是说,最好的选择是为正在处理的每个文件创建一个新的线程任务,并将其发送到池线程管理器。你的线程管理器应该将你正在运行的线程数量限制为可用的核心数量,或者如果内存是一个问题(你确实说你毕竟生成了 300MB 文件),则你可以用于该进程的最大内存量。
最后,要回答您不想为每个操作使用单独线程的原因,请考虑您已经了解的性能瓶颈。你的瓶颈在于CPU处理而不是IO。这意味着,如果您将应用程序拆分为单独的线程,那么您的读取和写入线程在等待处理线程完成的大部分时间里都会处于饥饿状态。此外,即使您让它们异步处理,您也会面临内存不足的风险,因为您的读取线程继续消耗处理线程无法跟上的数据。
因此,请注意不要立即启动每个线程,而是让它们由某种形式的阻塞队列管理。否则,您将面临系统速度变慢的风险,因为您在上下文切换上花费的时间比处理时间要多。当然,这是假设您没有首先崩溃。
It's unlikely that you are going to get consistent metrics for laptop hard disk performance as we have no idea how old your laptop is nor do we know if it is sold state or spinning.
Considering you have already done some basic profiling, I'd wager the CPU really is your bottleneck as it is impossible for a single threaded application to use more than 100% of a single cpu. This is of course ignoring your operating system splitting the process over multiple cores and other oddities. If you were getting 5% CPU usage instead, it'd be most likely were bottle necking at IO.
That said your best bet would be to create a new thread task for each file you are processing and send that to a pooled thread manager. Your thread manager should limit the number of threads you are running to either the number of cores you have available or if memory is an issue (you did say you were generating 300MB files after all) the maximum amount of ram you can use for the process.
Finally, to answer the reason why you don't want to use a separate thread for each operation, consider what you already know about your performance bottlenecks. You are bottle necked on cpu processing and not IO. This means that if you split your application into separate threads your read and write threads would be starved most of the time waiting for your processing thread to finish. Additionally, even if you made them process asynchronously, you have the very real risk of running out of memory as your read thread continues to consume data that your processing thread can't keep up with.
Thus, be careful not to start each thread immediately and let them instead be managed by some form of blocking queue. Otherwise you run the risk of slowing your system to a crawl as you spend more time in context switches than processing. This is of course assuming you don't crash first.
目前尚不清楚您拥有多少个 300MB 的文件。经过快速测试,在我的上网本上读取一个 300MB 的文件大约需要 5 或 6 秒。听起来确实像你受 CPU 限制。
线程可能会有所帮助,尽管它当然可能会使事情变得非常复杂。您还应该分析当前的代码 - 很可能您只是解析效率低下。 (例如,如果您使用 C# 或 Java,并且在循环中连接字符串,这通常是一个性能“陷阱”,可以轻松修复。)
如果您确实选择了多线程- 线程方法,为了避免破坏磁盘,您可能希望让一个线程将每个文件读入内存(一次一个),然后将该数据传递到解析线程池。当然,这是假设您也有足够的内存来执行此操作。
如果您能指定平台并提供您的解析代码,我们或许可以帮助您优化。目前我们真正能说的是,是的,听起来你很受 CPU 限制。
It's unclear how many of these 300MB files you've got. A single 300MB file takes about 5 or 6 seconds to read on my netbook, with a quick test. It does indeed sound like you're CPU-bound.
It's possible that threading will help, although it's likely to complicate things significantly of course. You should also profile your current code - it may well be that you're just parsing inefficiently. (For example, if you're using C# or Java and you're concatenating strings in a loop, that's frequently a performance "gotcha" which can be easily remedied.)
If you do opt for a multi-threaded approach, then to avoid thrashing the disk, you may want to have one thread read each file into memory (one at a time) and then pass that data to a pool of parsing threads. Of course, that assumes you've also got enough memory to do so.
If you could specify the platform and provide your parsing code, we may be able to help you optimize it. At the moment all we can really say is that yes, it sounds like you're CPU bound.
这么长的时间只有 300 MB 是很糟糕的。
根据具体情况,有不同的事情也可能会影响性能,但通常读取硬盘仍然可能是最大的瓶颈,除非在解析过程中发生了激烈的事情,这里似乎就是这种情况,因为它只需要几次秒从硬盘读取 300MB(除非碎片可能很严重)。
如果您在解析时有一些低效的算法,那么选择或提出更好的算法可能会更有益。如果您绝对需要该算法并且没有可用的算法改进,那么听起来您可能会陷入困境。
另外,不要尝试使用多线程同时进行多线程读写,您可能会减慢速度,从而增加查找次数。
That long for only 300 MB is bad.
There's different things that could be impacting performance as well depending upon the situation, but typically it's reading the hard disk is still likely the biggest bottleneck unless you have something intense going on during the parsing, and which seems the case here because it only takes several seconds to read 300MB from a harddisk (unless it's way bad fragged maybe).
If you have some inefficient algorithm in the parsing, then picking or coming up with a better algorithm would probably be more beneficial. If you absolutely need that algorithm and there's no algorithmic improvement available, it sounds like you might be stuck.
Also, don't try to multithread to read and write at the same time with the multithreading, you'll likely slow things way down to increased seeking.
鉴于您认为这是一个 CPU 密集型任务,您应该会看到单独 IO 线程的吞吐量总体有所增加(否则您唯一的处理线程将在磁盘读/写操作期间阻塞等待 IO)。
有趣的是,我最近遇到了类似的问题,并且通过运行单独的 IO 线程(以及足够的计算线程来加载所有 CPU 核心)确实看到了显着的净改进。
您没有说明您的平台,但我在 .NET 解决方案中使用了任务并行库和 BlockingCollection,并且实现几乎是微不足道的。 MSDN 提供了一个很好的示例。
更新:
正如 Jon 指出的,与计算所花费的时间相比,花在 IO 上的时间可能很小,因此虽然您可以期待改进,但时间的最佳利用可能是分析和改进计算本身。使用多个线程进行计算将显着加快速度。
Given that you think this is a CPU bound task, you should see some overall increase in throughput with separate IO threads (since otherwise your only processing thread would block waiting for IO during disk read/write operations).
Interestingly I had a similar issue recently and did see a significant net improvement by running separate IO threads (and enough calculation threads to load all CPU cores).
You don't state your platform, but I used the Task Parallel Library and a BlockingCollection for my .NET solution and the implementation was almost trivial. MSDN provides a good example.
UPDATE:
As Jon notes, the time spent on IO is probably small compared to the time spent calculating, so while you can expect an improvement, the best use of time may be profiling and improving the calculation itself. Using multiple threads for the calculation will speed up significantly.
嗯.. 300MB 的行必须分成很多 CAN 消息对象 - 讨厌!我怀疑技巧可能是线程化消息组件,同时避免读写操作之间过度的磁盘抖动。
如果我将其作为一项“新鲜”要求(当然,根据我的 20/20 事后诸葛亮,我知道 CPU 会成为问题),我可能会只使用一个线程进行读取,一个线程用于写入磁盘并且,最初至少有一个用于消息对象组装的线程。使用多个线程进行消息组装意味着在处理后对对象重新排序以防止输出文件被乱序写入会变得复杂。
我将定义一个良好的磁盘友好大小的行和消息对象数组实例的块类,例如其中 1024 个,并在启动时创建一个块池,例如 16 个,并将它们推送到存储队列中。这控制和限制了内存的使用,大大减少了 new/dispose/malloc/free(看起来你现在有很多这样的操作!),提高了磁盘读写操作的效率,因为只执行大的读写操作,(除了最后一个块,通常只部分填充),提供固有的流控制,(读取线程不能“逃跑”,因为池将用完块,并且读取线程将阻塞在池直到写入线程返回一些块),并抑制过多的上下文切换,因为只处理大块。
读取线程打开文件,从队列中获取块,读取磁盘,解析为行并将行推入块中。然后它将整个块排队到处理线程并循环以从池中获取另一个块。读取线程可能在启动时或空闲时在其自己的输入队列上等待包含读/写文件规范的消息类实例。写入文件规范可以通过块的字段传播,因此可以通过写入线程提供所需的一切。大块。这构成了一个很好的子系统,文件规范可以排队,并且它将处理所有文件规范而无需任何进一步的干预。
处理线程从其输入队列获取块,并将行分割成块中的消息对象,然后将已完成的整个块排队到写入线程。
写入线程将消息对象写入输出文件,然后将块重新排队到存储池队列以供读取线程重新使用。
所有队列都应该阻塞生产者-消费者队列。
线程子系统的一个问题是完成通知。当写入线程写入文件的最后一个块时,它可能需要执行某些操作。我可能会触发一个以最后一个块作为参数的事件,以便事件处理程序知道哪个文件已完全写入。我可能会做一些与错误通知类似的事情。
如果这还不够快,您可以尝试:
1) 确保在使用互斥体进行块磁盘分配期间,读取和写入线程不会被其他线程抢占。如果你的块足够大,这可能不会有太大区别。
2) 使用多个处理线程。如果这样做,块可能会“无序”到达写入线程。您可能需要一个本地列表,或许还需要块中的某种序列号,以确保磁盘写入的顺序正确。
祝你好运,无论你想出什么设计..Rgds
,
马丁
Hmm.. 300MB of lines that have to be split up into a lot of CAN message objects - nasty! I suspect the trick might be to thread off the message assembly while avoiding excessive disk-thrashing between the read and write operations.
If I was doing this as a 'fresh' requirement, (and of course, with my 20/20 hindsight, knowing that CPU was going to be the problem), I would probably use just one thread for reading, one for writing the disk and, initially at least, one thread for the message object assembly. Using more than one thread for message assembly means the complication of resequencing the objects after processing to prevent the output file being written out-of-order.
I would define a nice disk-friendly sized chunk-class of lines and message-object array instances, say 1024 of them, and create a pool of chunks at startup, 16 say, and shove them onto a storage queue. This controls and caps memory use, greatly reduces new/dispose/malloc/free, (looks like you have a lot of this at the moment!), improves the efficiency of the disk r/w operations as only large r/w are performed, (except for the last chunk which will be, in general, only partly filled), provides inherent flow-control, (the read thread cannot 'run away' because the pool will run out of chunks and the read thread will block on the pool until the write thread returns some chunks), and inhibits excess context-switching because only large chunks are processed.
The read thread opens the file, gets a chunk from the queue, reads the disk, parses into lines and shoves the lines into the chunk. It then queues the whole chunk to the processing thread and loops around to get another chunk from the pool. Possibly, the read thread could, on start or when idle, be waiting on its own input queue for a message class instance that contains the read/write filespecs. The write filespec could be propagated through a field of the chunks, so supplying the the write thread wilth everything it needs via. the chunks. This makes a nice subsystem to which filespecs can be queued and it will process them all without any further intervention.
The processing thread gets chunks from its input queue and splits the the lines up into the message objects in the chunk and then queues the completed, whole chunks to the write thread.
The write thread writes the message objects to the output file and then requeues the chunk to the storage pool queue for re-use by the read thread.
All the queues should be blocking producer-consumer queues.
One issue with threaded subsystems is completion notification. When the write thread has written the last chunk of a file, it probably needs to do something. I would probably fire an event with the last chunk as a parameter so that the event handler knows which file has been completely written. I would probably somethihng similar with error notifications.
If this is not fast enough, you could try:
1) Ensure that the read and write threads cannot be preemepted in favour of the other during chunk-disking by using a mutex. If your chunks are big enough, this probably won't make much difference.
2) Use more than one processing thread. If you do this, chunks may arrive at the write-thread 'out-of-order'. You would maybe need a local list and perhaps some sort of sequence-number in the chunks to ensure that the disk writes are correctly ordered.
Good luck, whatever design you come up with..
Rgds,
Martin