有什么办法可以提高 JVM 的启动速度吗?
据说Java在性能方面比Python快10倍。这也是我从基准测试中看到的。但真正拖垮 Java 的是 JVM 启动时间。
这是我所做的测试:
$time xlsx2csv.py Types\ of\ ESI\ v2.doc-emb-Package-9
...
<output skipped>
real 0m0.085s
user 0m0.072s
sys 0m0.013s
$time java -jar -client /usr/local/bin/tika-app-0.7.jar -m Types\ of\ ESI\ v2.doc-emb-Package-9
real 0m2.055s
user 0m2.433s
sys 0m0.078s
相同的文件,Docx 和 Python 中的 12 KB ms XLSX 嵌入文件速度快 25 倍!哇!!
Java 需要 2.055 秒。
我知道这都是由于启动时间造成的,但我需要的是我需要通过脚本调用它来解析一些我不想在 python 中重新发明轮子的文档。
但对于解析 10k+ 文件,这是不切实际的。
无论如何要加快速度(我已经尝试过 -client 选项,但它只加速了一点点(20%))。
我的另一个想法?将其作为长时间运行的守护进程运行,在本地使用 UDP 或 Linux-ICP 套接字进行通信?
It is said that Java is 10x faster than python in terms of performance. That's what I see from benchmarks too. But what really brings down Java is the JVM startup time.
This is a test I made:
$time xlsx2csv.py Types\ of\ ESI\ v2.doc-emb-Package-9
...
<output skipped>
real 0m0.085s
user 0m0.072s
sys 0m0.013s
$time java -jar -client /usr/local/bin/tika-app-0.7.jar -m Types\ of\ ESI\ v2.doc-emb-Package-9
real 0m2.055s
user 0m2.433s
sys 0m0.078s
Same file , a 12 KB ms XLSX embedded file inside Docx and Python is 25x faster !! WTH!!
It takes 2.055 sec for Java.
I know it is all due to startup time, but what i need is i need to call it via a script to parse some documents which i do not want to re-invent the wheel in python.
But as to parse 10k+ files , it is just not practical..
Anyway to speed it up (I already tried -client option and it only speed up by so little(20%) ).
My another idea? Run it as a long-running daemon , communicate using UDP or Linux-ICP sockets locally?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
尝试钉枪。
注意:我个人不使用它。
Try Nailgun.
Note: I don't use it personally.
我建议您参阅 Matthew Gilliard (mjg) 的 博客文章话题。下面的任何代码示例都直接来自那里。我不会包含计时示例,部分原因是为了保持简短,部分原因是为了引导您访问他的页面。 Matthew 从事 Fn 项目,因此他对找出如何缩短启动时间非常感兴趣。
显然有几种方法可以做到这一点,有些也非常简单。核心思想是缓存 JVM 的初始化周期,而不是在每次启动时执行它。
类数据共享 (CDS)
CDS 缓存 JDK 的确定性(硬件相关)启动过程。这是书中最简单、最古老(我相信是从 1.5 版本开始)的技巧(而且不是很出名)。
来自 Oracle
您可以通过运行 ... 手动创建转储
,然后将其与
AOT 一起使用:提前编译(Java 9+)
来自 mjg 的博客
使用 SubstrateVM (Java 8+)
不在博客文章中,但他在几天前的演讲中就展示了这一点。
来自自述文件:
I refer you to Matthew Gilliard's (mjg) blog post on the topic. Any code examples below come straight from there. I won't include timing examples partly to keep this short and partly to induce you to visit his page. Matthew works on the Fn Project so he's very interested in figuring out how to keep startup times low.
Apparently there are a few ways to do it, and some are pretty easy as well. The core idea is that you cache the JVM's initialization cycle instead of executing it on every startup.
Class Data Sharing (CDS)
CDS caches the deterministic (hardware dependant) startup process of the JDK. It's the easiest and oldest (since 1.5 I believe) trick in the book (and not very well-known).
From Oracle
You can create the dump manually by running
...and then use it with
AOT: Ahead of Time Compilation (Java 9+)
From mjg's blog
Use SubstrateVM (Java 8+)
Not in the blog post but demonstrated during the talk he gave a few days ago.
From the readme:
今天刚刚了解了点滴,作为钉枪的替代品:https://github.com/flatland/drip
另请参阅此页面以获取一些一般提示:另请参阅 https://github.com/ jruby/jruby/wiki/改进启动时间
Just learned about drip today, as an alternative replacement to nailgun: https://github.com/flatland/drip
Also see this page for some general hints: see also https://github.com/jruby/jruby/wiki/Improving-startup-time
将您的程序更改为客户端/服务器模型,其中 Java 部分是一个仅启动一次的持久服务器,由客户端提供信息,告诉它要做什么。客户端可以是一个小的 Python 脚本,告诉服务器进程要使用哪些文件。也许通过套接字发送命令或信号,由您决定。
Change your program to a client/server model, where the Java part is a persistent server that is started only once, fed by a client that tells it what to do. The client could be a small Python script telling the server process what files to consume. Maybe send commands via a socket, or signals, up to you.
嗯...将文档写入一个目录(如果还没有)并让 Java 程序一次性处理所有这些文档?
Um... write the documents to a directory (if they're not already) and have the Java program process all of them in one go?
有很多方法可以做到这一点 - 基本上任何方法都可以工作,只要它在所有批处理期间使 JVM 保持活动状态即可。
例如,为什么不直接更改 Java 程序以循环遍历所有文件并在 JVM 的一次调用中处理所有这些文件呢?
或者您可以在 Swing 中构建一个简单的 GUI 应用程序,并使用一些可视化的方式来运行批处理(例如,选择目标目录,然后按“处理全部...”按钮)。
或者您可以使用 Clojure REPL 作为编写适当 Java 作业执行脚本的方法...
或者您可以使用 Netty 之类的东西创建一个服务器进程,并通过它发送所有文件......
There are lots of ways to do this - basically anything will work providing it keeps the JVM alive for the duration of all of your batch processing.
e.g., why not just alter the Java program to loop through all the files and process them all in one invocation of the JVM?
Or you could build a simple GUI application in Swing and have some visual way to run the batch (e.g. select target directories, then press "Process All..." button).
Or you could use a Clojure REPL as a way to script the execution of the appropriate Java job....
Or you could create a server process with something like Netty and send all your files through that....