java在两个不同服务器上的性能
不确定这个问题应该在这里还是在serverfault中,但它与java相关,所以在这里:
我有两台服务器,具有非常相似的技术:
- server1是Oracle/Sun x86,带有双x5670 CPU(2.93 GHz)(每个4核) ),12GB 内存。
- server2 是 Dell R610,配备双 x5680 CPU (3.3 GHz)(每个 6 核)、16GB RAM。
两者都运行 Solaris x86,配置完全相同。
两者都启用了涡轮增压,并且没有超线程。
因此,server2 应该比 server1 稍快。
我正在两个平台上运行以下简短的测试程序。
import java.io.*;
public class TestProgram {
public static void main(String[] args) {
new TestProgram ();
}
public TestProgram () {
try {
PrintWriter writer = new PrintWriter(new FileOutputStream("perfs.txt", true), true);
for (int i = 0; i < 10000; i++) {
long t1 = System.nanoTime();
System.out.println("0123456789qwertyuiop0123456789qwertyuiop0123456789qwertyuiop0123456789qwertyuiop");
long t2 = System.nanoTime();
writer.println((t2-t1));
//try {
// Thread.sleep(1);
//}
//catch(Exception e) {
// System.out.println("thread sleep exception");
//}
}
}
catch(Exception e) {
e.printStackTrace(System.out);
}
}
}
我打开 perfs.txt 并对结果求平均值,得到:
- server1:average = 1664 , trim 10% = 1615
- server2:average = 1510 , trim 10% = 1429
这是一个有点预期的结果(server2 perfs > server1 perfs )。
现在,我取消注释“Thread.sleep(1)”部分并再次测试,现在的结果是:
- server1:average = 27598,trim 10%= 26583
- server2:average = 52320,trim 10%= 39359
这次server2 perfs < ; server1 性能
对我来说没有任何意义...
显然我正在寻找一种在第二种情况下改进 server2 性能的方法。肯定有某种配置不同,但我不知道是哪一种。 操作系统相同,java版本相同。
它可以与核心数量相关吗? 也许这是 BIOS 设置?尽管 BIOS 不同(AMI 与 Dell),但设置似乎非常相似。
我将尽快更新戴尔的 BIOS 并重新测试,但我将不胜感激任何见解......
谢谢
not sure if this question should be here or in serverfault, but it's java-related so here it is:
I have two servers, with very similar technology:
- server1 is Oracle/Sun x86 with dual x5670 CPU (2.93 GHz) (4 cores each), 12GB RAM.
- server2 is Dell R610 with dual x5680 CPU (3.3 GHz) (6 cores each), 16GB RAM.
both are running Solaris x86, with exact same configuration.
both have turbo-boost enabled, and no hyper-threading.
server2 should therefore be SLIGHTLY faster than server1.
I'm running the following short test program on the two platforms.
import java.io.*;
public class TestProgram {
public static void main(String[] args) {
new TestProgram ();
}
public TestProgram () {
try {
PrintWriter writer = new PrintWriter(new FileOutputStream("perfs.txt", true), true);
for (int i = 0; i < 10000; i++) {
long t1 = System.nanoTime();
System.out.println("0123456789qwertyuiop0123456789qwertyuiop0123456789qwertyuiop0123456789qwertyuiop");
long t2 = System.nanoTime();
writer.println((t2-t1));
//try {
// Thread.sleep(1);
//}
//catch(Exception e) {
// System.out.println("thread sleep exception");
//}
}
}
catch(Exception e) {
e.printStackTrace(System.out);
}
}
}
I'm opening perfs.txt and averaging the results, I get:
- server1: average = 1664 , trim 10% = 1615
- server2: average = 1510 , trim 10% = 1429
which is a somewhat expected result (server2 perfs > server1 perfs).
now, I uncomment the "Thread.sleep(1)" part and test again, the results are now:
- server1: average = 27598 , trim 10% = 26583
- server2: average = 52320 , trim 10% = 39359
this time server2 perfs < server1 perfs
that doesn't make any sense to me...
obviously I'm looking at a way to improve server2 perfs in the second case. there must be some kind of configuration that is different, and I don't know which one.
OS are identical, java version are identical.
could it be linked to the number of cores ?
maybe it's a BIOS setting ? although BIOS are different (AMI vs Dell), settings seem pretty similar.
I'll update the Dell's BIOS soon and retest, but I would appreciate any insight...
thanks
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
我会尝试不同的测试程序,尝试运行这样的东西。
这就是计时器,现在是主要的:
我认为这是测试系统是否真正运行得更快的好方法。试试这个,让我知道效果如何。
I would try a different test program, try running somthing like this.
Thats the timer now heres the main:
I think this is a good way to test wich system is truely running faster. Try this and let me know how it goes.
好吧,我有一个理论: Thread.sleep() 阻止热点编译器启动。因为你有一个睡眠,它假设循环不是“热”的,也就是说,它的效率有多高并不重要。循环中的代码是(因为毕竟,你睡眠的唯一目的可能是放慢速度)。
因此,您在循环内添加 Thread.sleep() ,循环中的其他内容也会运行得更慢。
我想知道如果循环中有一个循环并测量内部循环的性能是否会有所不同? (并且只有 Thread.sleep() 在外循环中)。在这种情况下,编译器可能会优化内部循环(如果有足够的迭代)。
(提出一个问题:如果这段代码是从生产代码中提取的测试用例,为什么生产代码会休眠?)
Ok, I have a theory: the Thread.sleep() prevents the hotspot compiler from kicking in. Because you have a sleep, it assumes the loop isn't "hot", i.e that it doesn't matter too much how efficient the code in the loop is (because, after all, you're sleep's only purpose could be to slow things down).
Hence, you add a Thread.sleep() inside the loop, and the other stuff in the loop also runs slower.
I wonder if it might make a difference if you have a loop inside a loop and measure the performance of the inner loop? (and only have the Thread.sleep() in the outer loop). In this case the compiler might optimize the inner loop (if there are enough iterations).
(Brings up a question: if this code is a test case extracted from production code, why does the production code sleep?)
我实际上更新了 DELL R610 上的 BIOS,并确保所有 BIOS CPU 参数都调整为最佳低延迟性能(无超线程等)。
它解决了它。与 & 的表演没有Thread.sleep是有意义的,并且R610在这两种情况下的整体性能都比Sun好得多。
看来原来的 BIOS 没有正确或充分利用 nehalem 功能(而 Sun 却做到了)。
I actually updated the BIOS on the DELL R610 and ensured all BIOS CPU parameters are adjusted for best low-latency performances (no hyper-threading, etc...).
it solved it. The performances with & without the Thread.sleep make sense, and the overall performances of the R610 in both cases are much better than the Sun.
It appears the original BIOS did not make a correct or a full usage of the nehalem capabilities (while the Sun did).
您正在测试控制台更新的速度。这完全取决于操作系统和窗口。如果您在 IDE 中运行它,它会比在 xterm 中运行慢得多。即使您使用哪种字体以及窗口有多大,也会对性能产生很大的影响。如果运行测试时窗口关闭,这将提高性能。
这是我运行相同测试的方法。该测试是独立的,可以进行您需要的分析。
在 2.6 GHz Xeon WIndows Vista 盒子上打印
You are testing how fast the console updates. This is entirely OS and window dependent. If you run this in your IDE it will be much slower than running in an xterm. Even which font you use and how big your window is will make a big different to performance. If your window is closed while you run the test this will improve performance.
Here is how I would run the same test. This test is self contained and does the analysis you need.
prints on a 2.6 GHz Xeon WIndows Vista box