我有一个程序从 C 移植到 Java。 这两个应用程序都使用快速排序来排序一些分区数据(基因组坐标)。
Java 版本运行速度很快,但我想让它更接近 C 版本。 我使用的是 Sun JDK v6u14。
显然,我无法与 C 应用程序相媲美,但我想了解如何才能在合理的范围内(在环境的限制内)获得尽可能多的性能。
我可以做哪些事情来测试应用程序不同部分的性能、内存使用情况等? 具体来说我会做什么?
另外,我可以(一般)实施哪些技巧来更改类和变量的属性和组织,减少内存使用并提高速度?
编辑:我正在使用 Eclipse,并且显然更喜欢任何第三方工具的免费选项。 谢谢!
I have a program I ported from C to Java. Both apps use quicksort to order some partitioned data (genomic coordinates).
The Java version runs fast, but I'd like to get it closer to the C version. I am using the Sun JDK v6u14.
Obviously I can't get parity with the C application, but I'd like to learn what I can do to eke out as much performance as reasonably possible (within the limits of the environment).
What sorts of things can I do to test performance of different parts of the application, memory usage, etc.? What would I do, specifically?
Also, what tricks can I implement (in general) to change the properties and organization of my classes and variables, reducing memory usage and improving speed?
EDIT : I am using Eclipse and would obviously prefer free options for any third-party tools. Thanks!
发布评论
评论(14)
不要试图比 jvm 更聪明。
特别是:
不要试图避免对象创建
为了性能
使用不可变对象,其中
适用。
使用对象的范围
正确,以便 GC 能够完成其任务
job.
在你想要的地方使用原语
原语(例如不可空 int
与可空整数相比)
使用内置算法和数据结构
处理并发时使用java.util.concurrent包。
正确性胜过性能。 首先做对,然后测量,然后使用分析器测量,然后优化。
do not try to outsmart the jvm.
in particular:
don't try to avoid object creation
for the sake of performance
use immutable objects where
applicable.
use the scope of your objects
correctly, so that the GC can do its
job.
use primitives where you mean
primitives (e.g. non-nullable int
compared to nullable Integer)
use the built-in algorithms and data structures
when handing concurrency use java.util.concurrent package.
correctness over performance. first get it right, then measure, then measure with a profiler then optimize.
显然,轮廓轮廓轮廓。 对于 Eclipse,有 TPTP。 这是一篇关于 Eclipse TPTP 插件 的文章。 Netbeans 有自己的分析器。 jvisualvm 作为独立工具非常好。 (整个 dev.java.net 服务器目前似乎已关闭,但它是一个非常活跃的项目。)
要做的第一件事是使用库排序例程,Collections.sort; 这将要求您的数据对象可比较 。 这可能足够快并且肯定会提供良好的基线。
一般提示:
StringBuilder
(不是StringBuffer
因为我刚才提到的那个锁,而不是连接String
对象final
; 如果可能的话,让你的类完全不可变ArrayList
< /a> (甚至是一个数组),因此您访问的内存是连续的,而不是像LinkedList
Obviously, profile profile profile. For Eclipse there's TPTP. Here's an article on the TPTP plugin for Eclipse. Netbeans has its own profiler. jvisualvm is nice as a standalone tool. (The entire dev.java.net server seems to be down at the moment, but it is very much an active project.)
The first thing to do is use the library sorting routine, Collections.sort; this will require your data objects to be Comparable. This might be fast enough and will definitely provide a good baseline.
General tips:
StringBuilder
(notStringBuffer
because of that lock thing I just mentioned) instead of concatenatingString
objectsfinal
; if possible, make your classes completely immutableArrayList
(or even an array) so the memory you're accessing is contiguous instead of potentially fragmented the way it might be with aLinkedList
使用探查器:
使用提供商提供的最新版本的 JVM。 顺便说一句,Sun 的 Java 6 update 14 确实带来了性能改进。
测量您的 GC 吞吐量并为您的工作负载选择最佳垃圾收集器。
Use a profiler:
Use the latest version of JVM from your provider. Incidentally Sun's Java 6 update 14 does bring performance improvements.
Measure your GC throughput and pick the best garbage collector for your workload.
不要过早优化。
衡量性能,然后优化。
尽可能使用最终变量。 它不仅允许 JVM
进行更多优化,同时也让您
代码更容易阅读和维护。
如果您使对象不可变,则不必克隆它们。
首先通过更改算法进行优化,然后通过更改实现来进行优化。
有时您需要诉诸旧式技术,例如循环展开或缓存预先计算的值。 记住它们,即使它们看起来不好看,它们也很有用。
Don't optimize prematurely.
Measure performance, then optimize.
Use final variables whenever possible. It will not only allow JVM
to optimize more, but also make your
code easier to read and maintain.
If you make your objects immutable, you don't have to clone them.
Optimize by changing the algorithm first, then by changing the implementation.
Sometimes you need to resort to old-style techniques, like loop unrolling or caching precalculated values. Remember about them, even if they don't look nice, they can be useful.
jvisualvm 现在随 JDK 6 一起提供 - 这就是上面引用的链接不起作用的原因。 只需输入“jvisualvm”,其中 是您要跟踪的进程的 ID。 您将看到堆的使用情况,但看不到堆中的内容。
如果是长时间运行的进程,可以在运行时打开-server选项。 有很多调整选项可供您使用; 这只是其中之一。
jvisualvm ships with JDK 6 now - that's the reason the link cited above doesn't work. Just type "jvisualvm <pid>", where <pid> is the ID of the process you want to track. You'll get to see how the heap is being used, but you won't see what's filling it up.
If it's a long-running process, you can turn on the -server option when you run. There are a lot of tuning options available to you; that's just one.
还可以尝试调整虚拟机的运行时参数 - 例如,最新版本的虚拟机包含以下标志,可以提高某些情况下的性能。
Also try tweaking the runtime arguments of the VM - the latest release of the VM for example includes the following flag which can improve performance in certain scenarios.
首先需要注意的是 - 在开始任何优化工作之前,请确保您已完成适当的分析或基准测试。 结果通常会启发您,并且几乎总是可以让您在优化无关紧要的事情时节省大量浪费的精力。
假设你确实需要它,那么你可以在Java中获得与C相当的性能,但这需要付出一些努力。 你需要知道 JVM 在哪里做“额外的工作”并避免这些。
特别是:
double
而不是Double
。First caveat - make sure you have done appropriate profiling or benchmarking before embarking on any optimisation work. The results will often enlighten you, and nearly always save you a lot of wasted effort in optimising something that doesn't matter.
Assuming that you do need it, then you can get performance comparable to C in Java, but it takes some effort. You need to know where the JVM is doing "extra work" and avoid these.
In particular:
double
and notDouble
.如果您的算法占用大量 CPU,您可能需要考虑利用并行化。 您也许可以在多个线程中进行排序,并稍后将结果合并回来。
然而,这并不是一个可以轻易做出的决定,因为编写并发代码很困难。
If your algorithm is CPU-heavy, you may want to consider taking advantage of parallelisation. You may be able to sort in multiple threads and merge the results back later.
This is however not a decision to be taken lightly, as writing concurrent code is hard.
不能使用 Java 库中包含的排序函数吗?
您至少可以看看两个排序功能之间的速度差异。
Can't you use the sort functions that are included in the Java library?
You could at least look at the speed difference between the two sorting functions.
从方法上讲,您必须分析应用程序,然后了解程序的哪些组件是时间和内存密集型的:然后仔细查看这些组件,以提高它们的性能(请参阅阿姆达尔定律)。
从纯粹的技术角度来看,您可以使用一些 java-to-nativecode 编译器,例如 Excelsior 的 jet,但我必须指出,最近的 JVM 非常快,因此 VM 不应产生重大影响。
Methodolically, you have to profile the application and then get an idea of what components of your program are time and memory-intensive: then take a closer look to that components, in order to improve their performances (see Amdahl's law).
From a pure technological POV, you can use some java-to-nativecode compilers, like Excelsior's jet, but I've to note that recent JVM are really fast, so the VM should not impact in a significative manner.
您的排序代码是仅执行一次(例如在仅排序的命令行实用程序中)还是多次(例如响应某些用户输入进行排序的 Web 应用程序)?
代码执行几次后,性能可能会显着提高,因为如果 HotSpot VM 确定您的代码是热点,它可能会积极优化。
与C/C++相比,这是一个很大的优势。
虚拟机在运行时会优化经常使用的代码,并且做得很好。 因此性能实际上可以超越 C/C++。 真的。 ;)
不过,您的自定义比较器可能是一个优化的地方。
尝试先检查便宜的东西(例如 int 比较),然后再检查更昂贵的东西(例如 String 比较)。 我不确定这些提示是否适用,因为我不知道您的比较器。
使用 Collections.sort(list, comparator) 或 Arrays.sort(array, comparator)。 数组变体会更快一点,请参阅相应的文档。
正如 Andreas 之前所说:不要试图比虚拟机更聪明。
Is your sorting code executing only once, e.g. in a commandline utility that just sorts, or multiple times, e.g. a webapp that sorts in response to some user input?
Chances are that performance would increase significantly after the code has been executed a few times because the HotSpot VM may optimize aggressively if it decides your code is a hotspot.
This is a big advantage compared to C/C++.
The VM, at runtime, optimizes code that is used often, and it does that quite well. Performance can actually rise beyond that of C/C++ because of this. Really. ;)
Your custom Comparator could be a place for optimization, though.
Try to check inexpensive stuff first (e.g. int comparison) before more expensive stuff (e.g. String comparison). I'm not sure if those tips apply because I don't know your Comparator.
Use either Collections.sort(list, comparator) or Arrays.sort(array, comparator). The array variant will be a bit faster, see the respective documentation.
As Andreas said before: don't try to outsmart the VM.
除了代码的微优化之外,也许还有其他提高性能的途径。 使用不同的算法来实现您希望程序执行的操作怎么样? 可能是不同的数据结构?
或者用一些磁盘/内存空间来换取速度,或者如果您可以在加载程序期间提前放弃一些时间,您可以预先计算查找表而不是进行计算 - 这样,处理速度就会很快。 即,对其他可用资源进行一些权衡。
Perhaps there are other routes to performance enhancement other than micro-optimization of code. How about a different algorithm to achieve what you wanted your program to do? May be a different data structure?
Or trade some disk/ram space for speed, or if you can give up some time upfront during the loading of your program, you can precompute lookup tables instead of doing calculations - that way, the processing is fast. I.e., make some trade-offs of other resources available.
这就是我会用任何语言做的事情。如果示例如果表明您的排序比较例程在大部分时间都处于活动状态,您可能会找到一种简化它的方法。 但也许时间都去别处了。 在修复任何东西之前,先进行诊断,看看哪里出了问题。 很可能,如果你修复了最重要的事情,那么其他事情也将是最重要的事情,依此类推,直到你真正获得了相当好的加速。
Here's what I would do, in any language. If samples show that your sort-comparison routine is active a large percentage of the time, you might find a way to simplify it. But maybe the time is going elsewhere. Diagnose first, to see what's broken, before you fix anything. Chances are, if you fix the biggest thing, then something else will be the biggest thing, and so on, until you've really gotten a pretty good speedup.
配置文件并调整您的 Java 程序和主机。 大多数代码遵循 80/20 规则。 这就是 80% 时间的 20% 代码,因此找到那 20% 并使其尽可能快。 例如,文章调整 Java 服务器 (http://www.infoq.com/articles/ Tuning-Java-Servers)提供了从命令行向下钻取的描述,然后使用 Java Flight recorder、Eclipse Memory Analyser 和 JProfiler 等工具隔离问题。
Profile and tune your java program and host machine. Most code follows 80/20 rule. That is 20% of code 80% of time, so find that 20% and make it as fast as possible. For example, the article Tuning Java Servers (http://www.infoq.com/articles/Tuning-Java-Servers) provides a description of drill down from command line and then isolate the problem using tools like Java Flight recorder, Eclipse Memory Analyser, and JProfiler.