有什么办法可以提高 JVM 的启动速度吗？

发布于 2024-09-30 08:33:17 字数 700 浏览 9 评论 0原文

据说Java在性能方面比Python快10倍。这也是我从基准测试中看到的。但真正拖垮 Java 的是 JVM 启动时间。

这是我所做的测试：

$time xlsx2csv.py Types\ of\ ESI\ v2.doc-emb-Package-9
...
<output skipped>
real    0m0.085s
user    0m0.072s
sys     0m0.013s


$time java  -jar -client /usr/local/bin/tika-app-0.7.jar -m Types\ of\ ESI\ v2.doc-emb-Package-9

real    0m2.055s
user    0m2.433s
sys     0m0.078s

相同的文件，Docx 和 Python 中的 12 KB ms XLSX 嵌入文件速度快 25 倍！哇！！

Java 需要 2.055 秒。

我知道这都是由于启动时间造成的，但我需要的是我需要通过脚本调用它来解析一些我不想在 python 中重新发明轮子的文档。

但对于解析 10k+ 文件，这是不切实际的。

无论如何要加快速度（我已经尝试过 -client 选项，但它只加速了一点点（20%））。

我的另一个想法？将其作为长时间运行的守护进程运行，在本地使用 UDP 或 Linux-ICP 套接字进行通信？

原文

It is said that Java is 10x faster than python in terms of performance. That's what I see from benchmarks too. But what really brings down Java is the JVM startup time.

This is a test I made:

$time xlsx2csv.py Types\ of\ ESI\ v2.doc-emb-Package-9
...
<output skipped>
real    0m0.085s
user    0m0.072s
sys     0m0.013s


$time java  -jar -client /usr/local/bin/tika-app-0.7.jar -m Types\ of\ ESI\ v2.doc-emb-Package-9

real    0m2.055s
user    0m2.433s
sys     0m0.078s

Same file , a 12 KB ms XLSX embedded file inside Docx and Python is 25x faster !! WTH!!

It takes 2.055 sec for Java.

I know it is all due to startup time, but what i need is i need to call it via a script to parse some documents which i do not want to re-invent the wheel in python.

But as to parse 10k+ files , it is just not practical..

Anyway to speed it up (I already tried -client option and it only speed up by so little(20%) ).

My another idea? Run it as a long-running daemon , communicate using UDP or Linux-ICP sockets locally?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

纵山崖 2024-10-07 08:33:17

尝试钉枪。

注意：我个人不使用它。

回复收藏 0 原文

独夜无伴 2024-10-07 08:33:17

我建议您参阅 Matthew Gilliard (mjg) 的博客文章话题。下面的任何代码示例都直接来自那里。我不会包含计时示例，部分原因是为了保持简短，部分原因是为了引导您访问他的页面。 Matthew 从事 Fn 项目，因此他对找出如何缩短启动时间非常感兴趣。

显然有几种方法可以做到这一点，有些也非常简单。核心思想是缓存 JVM 的初始化周期，而不是在每次启动时执行它。

类数据共享 (CDS)

CDS 缓存 JDK 的确定性（硬件相关）启动过程。这是书中最简单、最古老（我相信是从 1.5 版本开始）的技巧（而且不是很出名）。

来自 Oracle

当 JVM 启动时，共享存档会进行内存映射，以允许在多个 JVM 进程之间共享这些类的只读 JVM 元数据。启动时间减少，从而节省成本，因为恢复共享存档比加载类更快。

您可以通过运行 ... 手动创建转储

⇒ java -Xshare:dump
Allocated shared space: 50577408 bytes at 0x0000000800000000
Loading classes to share ...
// ...snip ...
total   :  17538717 [100.0% of total] out of  46272512 bytes [ 37.9% used]

，然后将其与

java -Xshare:on HelloJava

AOT 一起使用：提前编译（Java 9+）

来自 mjg 的博客

CDS 提前完成核心类的某些部分的类加载，而 AOT 实际上提前将字节码编译为本机代码（ELF 格式的共享对象文件），并且可以应用于任何字节码。

使用 SubstrateVM (Java 8+)

不在博客文章中，但他在几天前的演讲中就展示了这一点。

来自自述文件：

Substrate VM 是一个框架，允许在封闭世界假设下将 Java 应用程序提前 (AOT) 编译为可执行映像或共享对象（ELF-64 或 64 位 Mach-O）。

I refer you to Matthew Gilliard's (mjg) blog post on the topic. Any code examples below come straight from there. I won't include timing examples partly to keep this short and partly to induce you to visit his page. Matthew works on the Fn Project so he's very interested in figuring out how to keep startup times low.

Apparently there are a few ways to do it, and some are pretty easy as well. The core idea is that you cache the JVM's initialization cycle instead of executing it on every startup.

Class Data Sharing (CDS)

CDS caches the deterministic (hardware dependant) startup process of the JDK. It's the easiest and oldest (since 1.5 I believe) trick in the book (and not very well-known).

From Oracle

When the JVM starts, the shared archive is memory-mapped to allow sharing of read-only JVM metadata for these classes among multiple JVM processes. The startup time is reduced thus saving the cost because restoring the shared archive is faster than loading the classes.

You can create the dump manually by running

⇒ java -Xshare:dump
Allocated shared space: 50577408 bytes at 0x0000000800000000
Loading classes to share ...
// ...snip ...
total   :  17538717 [100.0% of total] out of  46272512 bytes [ 37.9% used]

...and then use it with

java -Xshare:on HelloJava

AOT: Ahead of Time Compilation (Java 9+)

From mjg's blog

Where CDS does some parts of classloading of core classes in advance, AOT actually compiles bytecode to native code (an ELF-format shared-object file) in advance, and can be applied to any bytecode.

Use SubstrateVM (Java 8+)

Not in the blog post but demonstrated during the talk he gave a few days ago.

From the readme:

Substrate VM is a framework that allows ahead-of-time (AOT) compilation of Java applications under closed-world assumption into executable images or shared objects (ELF-64 or 64-bit Mach-O).

回复收藏 0 原文