Java 实时性能

发布于 2024-07-22 07:27:20 字数 1459 浏览 5 评论 0原文

我正在处理需要非常高级的图像处理的 Java 项目。事实上，我使用 OpenCV 进行大部分操作，并且使用 JNI 来包装我需要的 OpenCV 函数。我对 OpenCV 提供的性能非常满意，编写 OpenCV 代码的人应该对代码给予极大的赞扬。与我对 Java 开发人员编写的代码的体验形成鲜明对比。

我开始对我的编程语言的选择持乐观态度，我的项目的第一次工作迭代运行良好，但它的性能远未达到实时（每 2 秒大约 1 帧）。我已经对我的代码进行了一些优化，并且它有很大帮助。我已经能够将帧速率提高到大约每秒 10-20 帧，这很棒，但我发现要进行任何进一步的优化，我必须重写 Java 代码来完成相同的操作，但 10效率提高 -20 倍。

我对 Java 开发人员很少关注性能感到震惊，尤其是在为媒体相关类编写类时。我已经下载了 OpenJDK，并且正在探索我正在使用的功能。例如，Raster 类下有一个名为 getPixels(...) 的函数，它获取图像的像素。我期望这个函数在源代码中是一个高度优化的函数，通过多次调用 System.arrayCopy 来进一步优化性能。相反，我发现的是非常“优雅”的代码，他们调用 5-6 个不同的类和 10-20 个不同的方法只是为了完成我可以在一行中完成的操作：

for (int i =0; i < n; i++) {
  long p = rawFrame[i];
  p = (p << 32) >>> 32;
  byte red = (byte) ((p >> 16) & 0xff);
  byte green = (byte) ((p >> 8) & 0xff);
  byte blue = (byte) ((p) & 0xff);
  byte val = (byte)(0.212671f * red + 0.715160f * green + 0.072169f * blue);
  data[i] = val;
  grayFrameData[i] = (val & 0x80) + (val & (0x7f)); 
}

上面的代码将图像转换为灰度并获取浮点像素数据，大约1-10ms。如果我想对 Java 内置函数执行相同的操作，则转换为灰度本身需要 200-300 毫秒，然后抓取浮动像素大约需要 50-100 毫秒。这对于实时性能来说是不可接受的。请注意，为了提高速度，我大量使用了按位运算符，而 Java 开发人员却回避这种操作。

我知道他们需要处理一般情况，但即便如此，他们至少不能提供优化选项，或者至少警告此代码的执行速度有多慢。

我的问题是，在开发的最后阶段（我已经有了我的第一次迭代，而不是我正在开发实时执行更多的第二次迭代）我应该咬紧牙关并切换到 C/C++ 吗？更多地调整事情，或者我应该坚持使用 Java 并希望事情变得更加实时友好，这样我就不必重写已经实现的 Java 代码来获得加速。

我真的开始对 Java 的“优雅”和缓慢感到厌恶。课程数量似乎有点过多。

原文

I'm working with Java project that requires very advanced manipulations of images. In fact, I'm doing most of the manipulation using OpenCV, and I'm using JNI to wrap around the OpenCV functions that I need. I am extremely satisfied with the performance OpenCV gives, the people who wrote the OpenCV code deserve great great credit for the code. In sharp contrast to what I experience with the code Java devs wrote.

I started out optimistic over the choice of my programming language, my first working iteration of the project works fine, but its performance is nowhere near to realtime (getting about 1 frame per 2 seconds.) I've done some optimizations of MY code and its helped a lot. I've been able to push the frame rate up to about 10-20 frames per second, which is great, but what I'm finding is that to do any further optimizations I have to rewrite Java code to do the same thing but 10-20x more efficient.

I'm appalled at how the developers of Java pay very little attention to performance, especially when writing the classes for Media related classes. I've downloaded OpenJDK and I'm exploring the functions I'm using. For example, there is a function under the Raster class called getPixels(...) and it gets the pixels of the image. I was expecting this function to be a highly optimized function in the source code, with several calls to System.arrayCopy to further optimize performance. Instead what I found was extremely "Classy" code, where they are calling 5-6 different classes and 10-20 different methods just to accomplish what I can do in one line:

for (int i =0; i < n; i++) {
  long p = rawFrame[i];
  p = (p << 32) >>> 32;
  byte red = (byte) ((p >> 16) & 0xff);
  byte green = (byte) ((p >> 8) & 0xff);
  byte blue = (byte) ((p) & 0xff);
  byte val = (byte)(0.212671f * red + 0.715160f * green + 0.072169f * blue);
  data[i] = val;
  grayFrameData[i] = (val & 0x80) + (val & (0x7f)); 
}

The code above transforms an image to grayscale and gets the float pixel data, in roughly 1-10ms. If I wanted to do the same with Java built in functions, the conversion to grayscale itself takes 200-300ms and then grabbing the float pixels takes about 50-100ms. This is unacceptable for real time performance. Note to get a speedup, I make heavy use of bitwise operators, which Java devs shy away from.

I understand that they need to handle the general case, but even so, can't they at least give options for optimizations or at the very least a warning how slow this code may perform.

My question is, at this late point in the development (I already have my first iteration, not I'm working on a second that performs more in real time) should I bite the bullet and switch over to C/C++ where I can fine tune things a lot more, or should I stick with Java and hope things will become more realtime friendly so that I won't have to rewrite already implemented Java code to get a speedup.

I'm really beginning to become disgusted with how "classy" and slow Java really is. The amount of classes there are seems like overkill.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

白芷 2024-07-29 07:27:21

我不知道您会获得多少性能提升，但如果您有一个长时间运行的进程执行重复的操作，您应该尝试使用 java -server 运行服务器热点虚拟机。它的性能比 Windows 上默认的客户端 VM 好得多，它针对快速启动时间进行了优化。

回复收藏 0 原文

镜花水月 2024-07-29 07:27:21

目前尚不清楚您是否真的在询问实时性。实时和真正快速之间是有区别的。对于真正的快速，考虑平均情况行为就足够了。吞吐量是主要关注点。实时意味着每次都能在固定的时间内完成某项任务。或者当然，有些应用程序需要两者。

在传统的 Java 实现中，例如 OpenJDK，垃圾收集器是实现实时行为的最大问题。这是因为垃圾收集器可以随时中断程序来完成其工作。我的公司 aicas 拥有不需要单独线程进行垃圾收集的 Java 实现。相反，一些 GC 工作是在分配时完成的。实际上，分配是通过为每个释放的块标记或清除几个块来支付的。这需要完全重新实现虚拟机。

编译是实时 Java 与传统 Java 实现的另一个不同点。实时 Java 技术倾向于使用静态或提前 (AoT) 编译而不是 JIT 编译。 JiT 可能适合您的应用程序，因为您可能能够容忍传统 VM 编译最常用的类所需的“预热”时间。如果是这样，那么您可能没有实时要求，只有吞吐量要求。

如果您有兴趣确保帧解码不会被垃圾收集中断，那么使用 Java 的实时实现以及 AoT 编译可能是有意义的。 Java 实时规范 (RTSJ) 还为实时和嵌入式编程提供其他支持，例如 RelatimeThread、AsyncEventHandler 和 RawMemoryAccess。

当然，要获得良好的性能，无论是实时还是真正快速，都需要关注细节。过度使用临时对象是没有帮助的。分配总是需要额外成本，因此应尽量减少。这对于函数式语言来说是一个重大挑战，因为函数式语言不允许更改对象的状态。但是，应该注意了解所编写代码的关键路径，以避免不必要的优化。分析对于了解优化工作最好花在哪里至关重要。

It is not clear that you are really asking about realtime. There is a difference between realtime and real fast. For real fast, it is sufficient to consider the average case behavior. Throughput is the main concern. Realtime means be able to finish some task within a fixed amount of time each and every time. Or course, there are applications that need both.

In a conventional Java implementation, such as OpenJDK, the garbage collector is the biggest problem for attaining realtime behavior. This is because the garbage collector can interrupt the program at any point to do its work. My company, aicas, has implementation of Java that does not require a separate thread for garbage collection. Instead, a bit of GC work is done at allocation time. Effectively, allocation is payed for by marking or sweeping a few blocks for each block freed. This has required a full reimplemenation of the virtual machine.

Compilation is another point where realtime Java differs from conventional Java implementations. Realtime Java technology tends to use static or Ahead-of-Time (AoT) compilation instead of JIT compilation. JiT may be okay for your application, as you may be able to tolerate the "warm up" time required by a conventional VM to compile the most used classes. If this is so, than you probably do not have realtime requirements, just throughput ones.

If you are interested in ensuring that frame decoding is not interrupted by garbage collection, then it would make sense to use a realtime implementation of Java and perhaps AoT compilation as well. The Real-Time Specification for Java (RTSJ) also provides other support for realtime and embedded programming such as RelatimeThread, AsyncEventHandler, and RawMemoryAccess.

Of course, obtaining good performance, whether realtime or real fast, requires attention to details. The over use of temporary object is not helpful. Allocation always entails extra cost, so should be minimized. This is a major challenge for functional languages, which do not allow changing the state of object. However, one should take care to understand the critical paths of the code being written to avoid unnecessary optimizations. Profiling is essential for understanding where optimization effort is best spent.

回复收藏 0 原文

榕城若虚 2024-07-29 07:27:21

过早的优化是万恶之源。

与其抱怨，不如编写一组优化的库并发布它们，但创建一个针对某些不存在的目标预先优化的“参考”java 实现是错误的。

参考实现的要点是编写易于理解、可维护的代码——它必须如此。我认为，人们总是希望供应商在必要时分析这个易于理解的版本并重新实现某些部分以提高速度。

回复收藏 0 原文

青衫负雪 2024-07-29 07:27:21

除了其他人所说的之外，您还可以为 JDK 贡献优化。如果您可以提供强大的优化，并且不牺牲通用性或可读性，我希望您能够将您的补丁包含在未来的 JDK 版本中。

因此，您不必希望 JDK 能够变得更好。您可以帮助实现这一目标。

回复收藏 0 原文

森林散布 2024-07-29 07:27:21

据我了解，最新版本的 Java（或者可能是 JavaFX）具有允许您访问系统视频硬件中的高级功能的方法。抱歉，我说得太笼统了，我相信我在 Java Posse 上听说过它，而且由于我陷入了 Java 1.3 的泥潭，所以我从来没有真正有机会去了解一下——但我确实记得听到过类似的事情。

这是关于它的一些内容：但看起来它只会在 Java 7 中:(

看起来它一开始也只支持播放流和基本的流操作——但也许“等等，Java 会改进”的方法实际上可能有效。

回复收藏 0 原文

很糊涂小朋友 2024-07-29 07:27:21

是什么阻止您编写您希望使用的方法的优化版本而不是使用内置方法？如果这是不可能的，为什么不使用更原生的语言编写对象，并将其导入到现有的应用程序中呢？

回复收藏 0 原文

好久不见√ 2024-07-29 07:27:20

我已经用 Java 完成了计算机视觉工作，我认为它非常适合计算机视觉和实时的东西，你只需要知道如何使用它。

潜在的优化：

如果您需要帮助优化代码，我很乐意提供帮助 - 例如，我可以告诉您，通过创建一个方法

`public static final int getGrayScale(final int pixelRGB){
    return (0.212671f * ((pixelRGB >> 16) & 0xff) + 0.715160f * ((pixelRGB >> 8) & 0xff) + 0.072169f * ((pixelRGB) & 0xff));
}`

并在 for{pixels} 循环中使用它，您可能会获得性能提升。通过使用方法调用，JVM 可以更大幅度地优化此操作，并且可能还可以更多地优化 for 循环。

如果您有 RAM 需要消耗，您可以为所有可能的 24 位像素像素颜色创建一个静态的最终输出灰度字节查找表。 RAM 中的大小约为 16 MB，但您不必执行任何浮点算术，只需进行一次数组访问。这可能更快，具体取决于您使用的 JVM，以及它是否可以优化数组边界检查。

寻找类似、更快的图像处理代码的地方：

我强烈建议您查看 ImageJ 图像处理应用程序和应用程序的代码。它的库，特别是 ij.process.TypeConverter。就像您的代码一样，它严重依赖于带有位旋转的直接数组操作和最少的额外数组创建。 Java2D 库（标准 JRE 的一部分）和 Java 高级成像 (JAI) 库提供了直接对图像数据快速进行图像处理的其他方法，而无需每次都进行自己的操作。对于 Java2D，您只需小心使用哪些函数即可。

为什么 Java2D 库如此间接：

大多数“类性”是由于支持多种颜色模型和存储格式（IE HSB 图像、基于浮点的颜色模型、索引颜色模型）。间接存在是有原因的，有时实际上可以提高性能——BufferedImage 类（例如）直接挂接到最近虚拟机中的图形内存中，以使某些操作更快。间接让它在很多时候向用户隐藏这一点。

I've done computer vision work with Java, and I think it is perfectly usable for computer vision and realtime stuff, you just have to know how to use it.

Potential Optimizations:

If you need help optimizing your code, I'd be glad to assist -- for example, I can tell you that you will probably get a performance boost by making a method

`public static final int getGrayScale(final int pixelRGB){
    return (0.212671f * ((pixelRGB >> 16) & 0xff) + 0.715160f * ((pixelRGB >> 8) & 0xff) + 0.072169f * ((pixelRGB) & 0xff));
}`

and using this in your for{pixels} loop. By using a method call, the JVM can much more heavily optimize this operation, and can probably optimize the for loop more too.

If you've got RAM to burn, you can create a static, final lookup table of output grayscale bytes for all possible 24-bit pixel pixel colors. This will be ~16 MB in RAM, but then you don't have to do any floating point arithmetic, just a single array access. This may be faster, depending on which JVM you are using, and whether or not it can optimize out array bounds checking.

Places to find similar, faster image processing code:

I would strongly suggest that you take a look at the code for the ImageJ image processing app & its libraries, specifically ij.process.TypeConverter. Just like your code, it relies heavily on direct array operations with bit-twiddling and a minimum of extra array creation. The Java2D libraries (part of the standard JRE) and the Java Advanced Imaging(JAI) library provide other ways to do image processing directly on image data rapidly without having to roll your own operation every time. For Java2D, you just have to be careful which functions you use.

Why the Java2D libraries are so indirect:

Most of the "class-iness" is due to supporting multiple color models and storage formats (I.E. HSB images, float-based color models, indexed color models). The indirection exists for a reason, and sometimes actually boosts performance -- the BufferedImage class (for example) hooks directly into graphics memory in recent VMs to make some operations MUCH faster. Indirection lets it mask this from the user a lot of the time.

回复收藏 0 原文