有什么办法可以提高 JVM 的启动速度吗?

发布于 2024-09-30 08:33:17 字数 700 浏览 9 评论 0原文

据说Java在性能方面比Python快10倍。这也是我从基准测试中看到的。但真正拖垮 Java 的是 JVM 启动时间。

这是我所做的测试:

$time xlsx2csv.py Types\ of\ ESI\ v2.doc-emb-Package-9
...
<output skipped>
real    0m0.085s
user    0m0.072s
sys     0m0.013s


$time java  -jar -client /usr/local/bin/tika-app-0.7.jar -m Types\ of\ ESI\ v2.doc-emb-Package-9

real    0m2.055s
user    0m2.433s
sys     0m0.078s

相同的文件,Docx 和 Python 中的 12 KB ms XLSX 嵌入文件速度快 25 倍!哇!!

Java 需要 2.055 秒。

我知道这都是由于启动时间造成的,但我需要的是我需要通过脚本调用它来解析一些我不想在 python 中重新发明轮子的文档。

但对于解析 10k+ 文件,这是不切实际的。

无论如何要加快速度(我已经尝试过 -client 选项,但它只加速了一点点(20%))。

我的另一个想法?将其作为长时间运行的守护进程运行,在本地使用 UDP 或 Linux-ICP 套接字进行通信?

It is said that Java is 10x faster than python in terms of performance. That's what I see from benchmarks too. But what really brings down Java is the JVM startup time.

This is a test I made:

$time xlsx2csv.py Types\ of\ ESI\ v2.doc-emb-Package-9
...
<output skipped>
real    0m0.085s
user    0m0.072s
sys     0m0.013s


$time java  -jar -client /usr/local/bin/tika-app-0.7.jar -m Types\ of\ ESI\ v2.doc-emb-Package-9

real    0m2.055s
user    0m2.433s
sys     0m0.078s

Same file , a 12 KB ms XLSX embedded file inside Docx and Python is 25x faster !! WTH!!

It takes 2.055 sec for Java.

I know it is all due to startup time, but what i need is i need to call it via a script to parse some documents which i do not want to re-invent the wheel in python.

But as to parse 10k+ files , it is just not practical..

Anyway to speed it up (I already tried -client option and it only speed up by so little(20%) ).

My another idea? Run it as a long-running daemon , communicate using UDP or Linux-ICP sockets locally?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

纵山崖 2024-10-07 08:33:17

尝试钉枪

注意:我个人不使用它。

Try Nailgun.

Note: I don't use it personally.

独夜无伴 2024-10-07 08:33:17

我建议您参阅 Matthew Gilliard (mjg) 的 博客文章话题。下面的任何代码示例都直接来自那里。我不会包含计时示例,部分原因是为了保持简短,部分原因是为了引导您访问他的页面。 Matthew 从事 Fn 项目,因此他对找出如何缩短启动时间非常感兴趣。

显然有几种方法可以做到这一点,有些也非常简单。核心思想是缓存 JVM 的初始化周期,而不是在每次启动时执行它。

类数据共享 (CDS)

CDS 缓存 JDK 的确定性(硬件相关)启动过程。这是书中最简单、最古老(我相信是从 1.5 版本开始)的技巧(而且不是很出名)。

来自 Oracle

当 JVM 启动时,共享存档会进行内存映射,以允许在多个 JVM 进程之间共享这些类的只读 JVM 元数据。启动时间减少,从而节省成本,因为恢复共享存档比加载类更快。

您可以通过运行 ... 手动创建转储

⇒ java -Xshare:dump
Allocated shared space: 50577408 bytes at 0x0000000800000000
Loading classes to share ...
// ...snip ...
total   :  17538717 [100.0% of total] out of  46272512 bytes [ 37.9% used]

,然后将其与

java -Xshare:on HelloJava

AOT 一起使用:提前编译(Java 9+)

来自 mjg 的博客

CDS 提前完成核心类的某些部分的类加载,而 AOT 实际上提前将字节码编译为本机代码(ELF 格式的共享对象文件),并且可以应用于任何字节码。

使用 SubstrateVM (Java 8+)

不在博客文章中,但他在几天前的演讲中就展示了这一点。

来自自述文件

Substrate VM 是一个框架,允许在封闭世界假设下将 Java 应用程序提前 (AOT) 编译为可执行映像或共享对象(ELF-64 或 64 位 Mach-O)。

I refer you to Matthew Gilliard's (mjg) blog post on the topic. Any code examples below come straight from there. I won't include timing examples partly to keep this short and partly to induce you to visit his page. Matthew works on the Fn Project so he's very interested in figuring out how to keep startup times low.

Apparently there are a few ways to do it, and some are pretty easy as well. The core idea is that you cache the JVM's initialization cycle instead of executing it on every startup.

Class Data Sharing (CDS)

CDS caches the deterministic (hardware dependant) startup process of the JDK. It's the easiest and oldest (since 1.5 I believe) trick in the book (and not very well-known).

From Oracle

When the JVM starts, the shared archive is memory-mapped to allow sharing of read-only JVM metadata for these classes among multiple JVM processes. The startup time is reduced thus saving the cost because restoring the shared archive is faster than loading the classes.

You can create the dump manually by running

⇒ java -Xshare:dump
Allocated shared space: 50577408 bytes at 0x0000000800000000
Loading classes to share ...
// ...snip ...
total   :  17538717 [100.0% of total] out of  46272512 bytes [ 37.9% used]

...and then use it with

java -Xshare:on HelloJava

AOT: Ahead of Time Compilation (Java 9+)

From mjg's blog

Where CDS does some parts of classloading of core classes in advance, AOT actually compiles bytecode to native code (an ELF-format shared-object file) in advance, and can be applied to any bytecode.

Use SubstrateVM (Java 8+)

Not in the blog post but demonstrated during the talk he gave a few days ago.

From the readme:

Substrate VM is a framework that allows ahead-of-time (AOT) compilation of Java applications under closed-world assumption into executable images or shared objects (ELF-64 or 64-bit Mach-O).

热风软妹 2024-10-07 08:33:17

今天刚刚了解了点滴,作为钉枪的替代品:https://github.com/flatland/drip
另请参阅此页面以获取一些一般提示:另请参阅 https://github.com/ jruby/jruby/wiki/改进启动时间

Just learned about drip today, as an alternative replacement to nailgun: https://github.com/flatland/drip
Also see this page for some general hints: see also https://github.com/jruby/jruby/wiki/Improving-startup-time

怼怹恏 2024-10-07 08:33:17

将您的程序更改为客户端/服务器模型,其中 Java 部分是一个仅启动一次的持久服务器,由客户端提供信息,告诉它要做什么。客户端可以是一个小的 Python 脚本,告诉服务器进程要使用哪些文件。也许通过套接字发送命令或信号,由您决定。

Change your program to a client/server model, where the Java part is a persistent server that is started only once, fed by a client that tells it what to do. The client could be a small Python script telling the server process what files to consume. Maybe send commands via a socket, or signals, up to you.

江湖正好 2024-10-07 08:33:17

嗯...将文档写入一个目录(如果还没有)并让 Java 程序一次性处理所有这些文档?

Um... write the documents to a directory (if they're not already) and have the Java program process all of them in one go?

你的心境我的脸 2024-10-07 08:33:17

有很多方法可以做到这一点 - 基本上任何方法都可以工作,只要它在所有批处理期间使 JVM 保持活动状态即可。

例如,为什么不直接更改 Java 程序以循环遍历所有文件并在 JVM 的一次调用中处理所有这些文件呢?

或者您可以在 Swing 中构建一个简单的 GUI 应用程序,并使用一些可视化的方式来运行批处理(例如,选择目标目录,然后按“处理全部...”按钮)。

或者您可以使用 Clojure REPL 作为编写适当 Java 作业执行脚本的方法...

或者您可以使用 Netty 之类的东西创建一个服务器进程,并通过它发送所有文件......

There are lots of ways to do this - basically anything will work providing it keeps the JVM alive for the duration of all of your batch processing.

e.g., why not just alter the Java program to loop through all the files and process them all in one invocation of the JVM?

Or you could build a simple GUI application in Swing and have some visual way to run the batch (e.g. select target directories, then press "Process All..." button).

Or you could use a Clojure REPL as a way to script the execution of the appropriate Java job....

Or you could create a server process with something like Netty and send all your files through that....

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文