服务器上的多线程应用程序比单线程慢(与 JUnit 测试不同)
我已将应用程序从单线程例程切换为多线程例程。
这在 JUnit 测试中工作得很好。当使用 10
线程运行时,测试需要 195
毫秒才能完成,而当仅使用一个线程运行时,应用程序需要 406
毫秒才能完成。所以显然有性能优势。
但是,当在服务器上运行它时,应用程序现在比仅单线程时需要更长的时间。
基本上,我的应用程序读取 csv 文件中的一行,将其中一个值放入一组中,然后将该行打印到另一个文件中。 JUnit 测试中输入文件的大小约为 35
行长,服务器上的输入文件大小约为 6 000 000
行长。
放置这些值的集合是一个同步的 HashSet
,它可以包含 Long
对象。
我正在使用 Java VisualVM 监视我的应用程序,但不幸的是我不知道要寻找什么。
您对如何解决这场性能危机有什么建议吗?
PS:大多数时候我的线程被标记为等待,但我不知道它们是否真的在等待,或者它们是否太快以至于 Java VisualVM 无法显示它。
为了进一步阐明我的例程:我单线程读取文件,但是一旦读取该行,我就会将结果对象传递给 Runnable
,该对象将其放入集合中并将其打印到另一个文件中。同时读取下一行并将其传递给其他线程。
正如我在日志文件中看到的那样,线程正在执行某些操作,而不仅仅是等待。但存在某些跳跃,即超过100
毫秒的时间段,其中什么也没有发生。
其中一个跳转:
2011-04-08 12:27:16,580 DEBUG [Thread-10] runnables.Runner - 7070927
2011-04-08 12:27:16,580 DEBUG [Thread-10] runnables.Runner - 9058759
2011-04-08 12:27:16,580 DEBUG [Thread-10] runnables.Runner - 7030928
2011-04-08 12:27:16,580 DEBUG [Thread-10] runnables.Runner - 15301035
2011-04-08 12:27:16,684 DEBUG [Thread-10] runnables.Runner - 7700929
2011-04-08 12:27:16,684 DEBUG [Thread-10] runnables.Runner - 17116545
2011-04-08 12:27:16,685 DEBUG [Thread-10] runnables.Runner - 4933581
2011-04-08 12:27:16,685 DEBUG [Thread-10] runnables.Runner - 2861116
注意:当时没有发生 GC。
正如下面的评论所写:我正在使用线程池。我的线程正在争夺*同一个输出文件。它们都写入synchronized
方法。
即使我将跑步池的大小减小到 1,性能仍然很糟糕。与之前的实现相比没有什么。这不是排除了 IO 依赖或线程切换等问题吗?
我现在已经修改了我的代码,以便在 Runnable 内部几乎没有执行任何操作。没有Set
,没有书写。只有一个日志语句。但我仍然得到了那些跳跃
。 所以我排除了一些人提出的书写或Set
问题。当只运行一个线程时,我也得到了这些空闲时间。所以线程切换似乎也不是问题。
I've switched in my application from a single to a multi threaded routine.
This works pretty fine in the JUnit tests. When running it with 10
threads, the test needs 195
ms to complete and when running it with only one thread the application takes 406
ms to finish. So there clearly is a performance advantage.
But when running it on the server, the application now needs much longer than when it was only single threaded.
Basically, my application reads a line in a csv file, puts one of its value in a set and prints the line to another file.
The size of the input file in the JUnit tests is about 35
lines long, the one on the server about 6 000 000
lines long.
The set in which those values are put is a synchronized HashSet
which can contain Long
objects.
I'm monitoring my application with the Java VisualVM but unfortunately I don't know what to look for.
Do you have any hints for me on how to solve this performance crisis?
P. S.: Most of the time my threads are marked as waiting, but I don't know if they are really waiting or if they are just too fast for the Java VisualVM to display it.
To further clarify my routine: I read the file single threaded, but as soon as the line is read I pass the resulting object to a Runnable
that puts it into a set and prints it into another file. Meanwhile the next lines are read and passed to other threads.
As I can see it in my log file, the threads are doing something and aren't just waiting. But there are certain jumps, periods longer than 100
ms where nothing is happening.
One of those jumps:
2011-04-08 12:27:16,580 DEBUG [Thread-10] runnables.Runner - 7070927
2011-04-08 12:27:16,580 DEBUG [Thread-10] runnables.Runner - 9058759
2011-04-08 12:27:16,580 DEBUG [Thread-10] runnables.Runner - 7030928
2011-04-08 12:27:16,580 DEBUG [Thread-10] runnables.Runner - 15301035
2011-04-08 12:27:16,684 DEBUG [Thread-10] runnables.Runner - 7700929
2011-04-08 12:27:16,684 DEBUG [Thread-10] runnables.Runner - 17116545
2011-04-08 12:27:16,685 DEBUG [Thread-10] runnables.Runner - 4933581
2011-04-08 12:27:16,685 DEBUG [Thread-10] runnables.Runner - 2861116
Note: No GC happened at that time.
As written in a comment below: I am using a threadpool. My threads are fighting* over the same output file. They all write to a synchronized
method.
Even if I reduce the size of my tread pool to one, the performance is still horrible. Nothing compared to the previous implementation. Wouldn't that rule out things like IO dependency or thread switching?
I've modified my code now so that inside the Runnable
nearly nothing is done. No Set
, no writing. Just one log statement. And still I get those jumps
.
So I rule out the writing or Set
problem proposed by some. And when running only one thread, I also got these idle times. So thread switiching also doesn't seem to be the problem.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
您的测试文件非常小,因此整个 I/O 堆栈中的任何预读层都可能完全读取它。这使得整个执行过程受 CPU 限制。线程越多,您就可以使用更多的 CPU,从而更快地完成工作。
真实的文件 OTOH 更长,因此问题变得受 IO 限制。 CPU 大部分时间都在等待读取数据。在单线程上,不存在争用,并且 IO 可能更加线性;而多线程版本更有可能生成大量光盘搜索(迄今为止在当今的硬件上可以执行的最慢的操作)
根据经验,如果您从光盘或网络读取数据并且不对它进行繁重的处理,最好采用单线程。
Your test file is very small, so it's likely read completely by any read-ahead layer in the whole I/O stack. That makes the whole execution CPU bound. With more threads, you use more CPUs and get it done faster.
The real file, OTOH, is much longer, so the problem becomes IO-bound. The CPUs spend most of the time waiting for the read data. On a single thread, there's no contention and probably the IO is more linear; while the multithreaded version is more likely to generate lots of disc seeks (by far the slowest operation you can do on today's hardware)
As a rule of thumb, if you read data from disc or network and don't do heavy processing on it, it's better to go single threaded.
您所指的“跳跃”是线程之间的切换时间。由于总执行时间有限,因此线程越多,一个线程的执行时间就会越短。如果您有很多线程,您的调度程序最终会切换线程,并且没有线程执行任何工作。从一个线程切换到另一个线程需要花费一定的固定时间。如果您的线程不使用多个核心并执行完全相同的操作,那么在比较多线程与单线程时,您最终会得到更差的速度。
The "jumps" you are reffering to are the switching times between the threads. Because overall execution time is limited the execution time for one thread becomes smaller the more threads you have. If you have to many threads your scheduler ends up in switching the threads and no thread does any work. Switching from one thread to another costs a certain fixed amount of time. if your threads don't use more than one core and do the exact same thing then you end up in a worse speed when comparing multithreaded with singlethreaded.
我不知道问题到底是什么,但似乎是由于
Executor
接口。我现在正在使用
,一切正常。
17.12
分钟10
线程例程的持续时间:13.45
分钟我发现了错误的代码:
当线程队列已满时被调用。
I don't know exactly what the problem was, but it seems that it was caused by a bad implementation of the
Executor
interface.I'm now using
and everything is working fine.
17.12
min10
threaded routine:13.45
minI found the bad piece of code:
was invoked when the thread queue was full.