为什么此 Java 代码比相同的 C# 代码快 6 倍?
我对 Project Euler 问题 5 有几种不同的解决方案,但是在这个特定的实现中,两种语言/平台之间的执行时间差异引起了我的兴趣。我没有对编译器标志进行任何优化,只是简单的 javac
(通过命令行)和 csc
(通过 Visual Studio)。
这是 Java 代码。 55 毫秒内完成。
public class Problem005b
{
public static void main(String[] args)
{
long begin = System.currentTimeMillis();
int i = 20;
while (true)
{
if (
(i % 19 == 0) &&
(i % 18 == 0) &&
(i % 17 == 0) &&
(i % 16 == 0) &&
(i % 15 == 0) &&
(i % 14 == 0) &&
(i % 13 == 0) &&
(i % 12 == 0) &&
(i % 11 == 0)
)
{
break;
}
i += 20;
}
long end = System.currentTimeMillis();
System.out.println(i);
System.out.println(end-begin + "ms");
}
}
这是相同的 C# 代码。 320ms 内完成
using System;
namespace ProjectEuler05
{
class Problem005
{
static void Main(String[] args)
{
DateTime begin = DateTime.Now;
int i = 20;
while (true)
{
if (
(i % 19 == 0) &&
(i % 18 == 0) &&
(i % 17 == 0) &&
(i % 16 == 0) &&
(i % 15 == 0) &&
(i % 14 == 0) &&
(i % 13 == 0) &&
(i % 12 == 0) &&
(i % 11 == 0)
)
{
break;
}
i += 20;
}
DateTime end = DateTime.Now;
TimeSpan elapsed = end - begin;
Console.WriteLine(i);
Console.WriteLine(elapsed.TotalMilliseconds + "ms");
}
}
}
I have a few different solutions to Project Euler problem 5, but the execution time difference between the two languages/platforms in this particular implementation intrigues me. I didn't do any optimization with compiler flags, just plain javac
(via commandline) and csc
(via Visual Studio).
Here's the Java code. It finishes in 55ms.
public class Problem005b
{
public static void main(String[] args)
{
long begin = System.currentTimeMillis();
int i = 20;
while (true)
{
if (
(i % 19 == 0) &&
(i % 18 == 0) &&
(i % 17 == 0) &&
(i % 16 == 0) &&
(i % 15 == 0) &&
(i % 14 == 0) &&
(i % 13 == 0) &&
(i % 12 == 0) &&
(i % 11 == 0)
)
{
break;
}
i += 20;
}
long end = System.currentTimeMillis();
System.out.println(i);
System.out.println(end-begin + "ms");
}
}
Here is the identical C# code. It finishes in 320ms
using System;
namespace ProjectEuler05
{
class Problem005
{
static void Main(String[] args)
{
DateTime begin = DateTime.Now;
int i = 20;
while (true)
{
if (
(i % 19 == 0) &&
(i % 18 == 0) &&
(i % 17 == 0) &&
(i % 16 == 0) &&
(i % 15 == 0) &&
(i % 14 == 0) &&
(i % 13 == 0) &&
(i % 12 == 0) &&
(i % 11 == 0)
)
{
break;
}
i += 20;
}
DateTime end = DateTime.Now;
TimeSpan elapsed = end - begin;
Console.WriteLine(i);
Console.WriteLine(elapsed.TotalMilliseconds + "ms");
}
}
}
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(7)
StopWatch
类。
StopWatch
class.有一些可能的优化。也许 Java JIT 正在执行它们,而 CLR 没有。
优化 #1:
相当于
因此在您的示例中,比较链可以替换为
(但是当然,如果您已经计算了 LCM,那么首先运行该程序就没有什么意义了! )
优化#2:
这也是等价的:
或者
第一个除法可以用掩码替换并与零比较:
第二个除法可以用模逆的乘法替换:
这有点不太可能JIT 正在使用这个,但它是并不像您想象的那么牵强 - 一些 C 编译器以这种方式实现指针减法。
There are a few optimizations possible. Maybe the Java JIT is performing them and the CLR is not.
Optimization #1:
is equivalent to
So in your example the comparison chain could be replaced by
(but of course if you've already calculated the LCM, there's little point in running the program in the first place!)
Optimization #2:
This is also equivalent:
or
The first division can be replaced with a mask and compare against zero:
The second division can be replaced by a multiplication by the modular inverse:
It is somewhat unlikely that the JIT is employing this, but it is not as far-fetched as you might think - some C compilers implement pointer subtraction this way.
让两者变得更接近的关键是确保比较的公平性。
首先,确保与运行调试构建相关的成本,像您一样加载 pdb 符号。
接下来,您需要确保没有计算初始成本。显然,这些是实际成本,对某些人来说可能很重要,但在这种情况下,我们对循环本身感兴趣。
接下来您需要处理平台特定的行为。如果您使用的是 64 位 Windows 计算机,您可能会以 32 位或 64 位模式运行。在 64 位模式下,JIT 在许多方面都不同,通常会大大改变生成的代码。具体来说,我猜测您可以访问两倍的通用寄存器。
在这种情况下,循环的内部部分,当天真地翻译成机器代码时,需要将模测试中使用的常量加载到寄存器中。如果不足以容纳循环中所需的所有内容,则必须将它们从内存中推入。即使来自一级缓存,与将其全部保存在寄存器中相比,这也会是一个重大打击。
在 VS 2010 MS 将默认目标从anycpu更改为x86。我没有像 MSFT 那样的资源或面向客户的知识,所以我不会尝试再次猜测。然而,任何关注您正在做的性能分析之类的事情的人都应该尝试这两种方法。
一旦这些差异得到消除,这些数字似乎就更加合理了。任何进一步的差异可能需要比有根据的猜测更好的方法,而是需要调查生成的机器代码中的实际差异。
我认为对于优化编译器来说,有几件事会很有趣。
这些都是纯粹的猜测,应该被视为无意义的曲折。如果你想知道拆解它。
The key to making these two become closer is to ensure that the comparison is fair.
First of all ensuring that costs associated with running Debug builds, loading pdb symbols as you did.
Next you need to ensure that there are no init costs being counted. Obviously these are real costs, and may matter to some people, but in this instance we are interested in the loop itself.
Next you need to deal with the platform specific behaviour. If you are on a 64bit windows machine you may be running either in 32bit or 64bit mode. In 64bit mode the JIT is different in many respects, often altering the resulting code considerably. Specifically, and I would guess pertinently, you get access to twice as many general purpose registers.
In this case the inner section of the loop, when naively translated into machine code, would need to load into registers the constants used in the modulo tests. If there are insufficient to hold everything needed in the loop then it must push them in from memory. Even coming from level1 cache this would be a significant hit compared to keeping it all in registers.
In VS 2010 MS changed the default target from anycpu to x86. I have nothing like the resources or customer facing knowledge of MSFT so I won't try to second guess that. However anyone looking at anything like the performance analysis you are doing should certainly try both.
Once those disparities are ironed out the numbers seem far more reasonable. Any further differences likely require better than educated guesses, instead they would need investigation into the actual differences in the generated machine code.
There are several things about this I think would be interesting for an optimising compiler.
These are all utter guesses, and should be viewed as the idle meanderings. If you want to know disassemble it.
(从 OP 移出)
将目标从 x86 更改为 anycpu 已降低每次运行的平均执行时间从 282 毫秒降至 84 毫秒。也许我应该把它分成第二个线程?
更新:
感谢下面的 Femaref 指出了一些测试问题,事实上,在遵循他的建议后,时间变短了,这表明 VM 设置时间在 Java 中很重要,但在 C# 中可能不是。在 C# 中,调试符号非常重要。
我更新了代码,让每个循环运行 10,000 次,最后只输出平均毫秒数。我所做的唯一重大更改是 C# 版本,我切换到 [StopWatch 类][3] 以获得更高的分辨率。我坚持使用毫秒,因为它已经足够好了。
结果:
测试更改并不能解释为什么 Java(仍然)比 C# 快得多。 C# 性能更好,但这完全可以通过删除调试符号来解释。如果您阅读 [Mike Two][4] 和我对此 OP 所附评论的交流,您会发现仅通过从“调试”切换到“发布”,我在 C# 代码的五次运行中平均获得了约 280 毫秒。
数字:
全部这使得差异无法解释。事实上,差距变得更糟。 Java 从大约 5.8 倍快到大约 6.2 倍。
(Moved from OP)
Changing the target from x86 to anycpu has lowered the average execution time to 84ms per run, down from 282ms. Maybe I should split this off into a second thread?
UPDATE:
Thanks to Femaref below who pointed out some testing problems, and indeed, after following his suggestions, the times are lower, indicating that VM setup time was significant in Java, but probably not in C#. In C#, it was the debug symbols that were significant.
I updated my code to run each loop 10,000 times, and only output the average ms at the end. The only significant change I made was to the C# version where I switched to the [StopWatch class][3] for greater resolution. I stuck with milliseconds because it's good enough.
Results:
The testing changes don't explain why Java is (still) so much faster than C#. C# performance was better, but this can be explained entirely by removing the debug symbols. If you read [Mike Two][4] and I's exchange on the comments attached to this OP, you'll see I got ~280ms average in five runs of the C# code just by switching from Debug to Release.
Numbers:
All of this leaves the difference unexplained. In fact, the differential got worse. Java went from being ~5.8x faster to ~6.2x faster.
这项任务太短,无法安排适当的时间。您需要运行至少 1000 次,看看会发生什么。看起来像是从命令行运行它们,在这种情况下,您可能会比较两者的 JIT 编译器。尝试将两个按钮放在一个简单的 GUI 中,并让该按钮在返回经过的时间之前至少循环数百次。即使忽略 JIT 编译,操作系统调度程序的粒度也可能会导致时间错乱。
哦,由于 JIT...只计算按钮按下的第二个结果。 :)
This is too short a task to do proper timing for. You need to run both at least 1000 times and see what happens. It kind of looks like you're running these from a command line, in which case you're possibly comparing the JIT compilers for both. Try putting both behind buttons in a simple GUI and have that button loop over this a few hundred times at least before returning the elapsed time. Even ignoring JIT compiling, the timing could be thrown off by the granularity of the OS scheduler.
Oh, and because of JIT... only count the SECOND result of a button press. :)
也许是因为
DateTime
对象的构造比System.currentTimeMillis
昂贵得多。Perhaps because the construction of the
DateTime
objects is much more expensive than theSystem.currentTimeMillis
.在 Java 中我会使用 System.nanoTime()。任何耗时少于 2 秒的测试都应该运行更长时间。值得注意的是,Java 非常擅长优化低效代码或不执行任何操作的代码。更有趣的测试是优化代码。
您正在尝试获得一个无需使用循环即可确定的解决方案。即一个可以用另一种方式解决的问题。
您需要 11 到 20 的因数的乘积,即 2,2,2,2,3,3,5,7,11,13,17,19。将它们相乘,你就得到了答案。
In Java I would use System.nanoTime(). Any test which takes less than 2 seconds should be run for longer. It is worth noting that Java is pretty good at optimising inefficient code or code which does nothing. A more interesting test would be if you optimised the code.
You are trying to get a solution which you can determine without using a loop. i.e. a problem which would be done better another way.
You want the product of the factors of 11 to 20, which are 2,2,2,2,3,3,5,7,11,13,17,19. Multiply these together and you have the answer.