Runtime.exec 导致重复的 JVM 无限期挂起直至被杀死 (Solaris 10)
所有,
我们正在 WebLogic 服务器 9.2 MP2 上运行 J2EE 应用程序,并在 Solaris 10 上使用 jrockit 64 位 JVM (27.3.1)。
我们调用 use runtime.exec 来调用名为 jfmerge 的可执行文件来创建 PDF 文档。
我们发现,在 Solaris 中,当调用 runtime.exec 时,会临时生成一个重复的 JVM 以启动 jfmerge 进程。 虽然这效率很低(我们的 JVM 是 5 GB,因此重复的 shell JVM 也是 5 GB),但主要问题在于,当我们的应用程序中此功能(PDF 生成)负载很重时,有时重复的 JVM永远不会退出。
当 JVM 挂起时,服务器会产生大问题(应用程序极度缓慢和用户会话终止),因为整个重复的 JVM 将其所有 5 GB 进程大小写入磁盘交换。
我们注意到以下挂起线程与挂起 JVM 进程相关,直到该进程被手动终止:
“[STUCK] ExecuteThread:'17'对于队列:'weblogic.kernel.Default (自调整)'" id=3463 idx=0x158 tid=3460 prio=1 活着,在本机中, 守护进程 在 jrockit/io/FileNativeIO.readBytesPinned(Ljava/io/FileDescriptor;[BII)I(本机 方法) 在 jrockit/io/FileNativeIO.readBytes(FileNativeIO.java:30) 在 java/io/FileInputStream.readBytes([BII)I(FileInputStream.java) 在 java/io/FileInputStream.read(FileInputStream.java:194) 在 java/lang/UNIXProcess$DeferredCloseInputStream.read(UNIXProcess.java:227) 在 java/io/BufferedInputStream.fill(BufferedInputStream.java:218) 在 java/io/BufferedInputStream.read(BufferedInputStream.java:235) ^-- 保持锁定: java/io/BufferedInputStream@0xfffffffec6510470[薄锁] 在 gov/v3/common/form Generation/sessionbean/FormsBean.getProcessStatus(FormsBean.java:809) 在 gov/v3/common/form Generation/sessionbean/FormsBean.createPDF(FormsBean.java:750) 在 gov/v3/common/form Generation/sessionbean/FormsBean.getTemplateDetails(FormsBean.java:450) 在 gov/v3/common/form Generation/sessionbean/FormsBean.generateSinglePDF(FormsBean.java:1371) 在 gov/v3/common/form Generation/sessionbean/FormsBean.generatePDF(FormsBean.java:263) 在 gov/v3/common/form Generation/sessionbean/FormsBean.endorseDocument(FormsBean.java:2377) 在 gov/v3/common/form Generation/sessionbean/Forms_qaco28_EOImpl.endorseDocument(Forms_qaco28_EOImpl.java:214) 在 gov/v3/delegates/common/FormsAndNoticesDelegate.endorseDocument(FormsAndNoticesDelegate.java:128) 在 gov/v3/actions/common/EndorseDocumentAction.executeRequest(EndorseDocumentAction.java:68) 在 gov/v3/fwk/controller/struts/action/V3CommonDispatchAction.dispatchToExecuteMethod(V3CommonDispatchAction.java:532) 在 gov/v3/fwk/controller/struts/action/V3CommonDispatchAction.executeBaseAction(V3CommonDispatchAction.java:336) 在 gov/v3/fwk/controller/struts/action/V3BaseDispatchAction.execute(V3BaseDispatchAction.java:69) 在 org/apache/struts/action/RequestProcessor.processActionPerform(RequestProcessor.java:484) 在 gov/v3/fwk/controller/struts/requestprocessor/V3TilesRequestProcessor.processActionPerform(V3TilesRequestProcessor.java:384) 在 org/apache/struts/action/RequestProcessor.process(RequestProcessor.java:274) 在 org/apache/struts/action/ActionServlet.process(ActionServlet.java:1482) 在 org/apache/struts/action/ActionServlet.doGet(ActionServlet.java:507) 在 gov/v3/fwk/controller/struts/servlet/V3ControllerServlet.doGet(V3ControllerServlet.java:110) 在 javax/servlet/http/HttpServlet.service(HttpServlet.java:743) 在 javax/servlet/http/HttpServlet.service(HttpServlet.java:856) 在 weblogic/servlet/internal/StubSecurityHelper$ServletServiceAction.run(StubSecurityHelper.java:227) 在 weblogic/servlet/internal/StubSecurityHelper.invokeServlet(StubSecurityHelper.java:125) 在 weblogic/servlet/internal/ServletStubImpl.execute(ServletStubImpl.java:283) 在 weblogic/servlet/internal/ServletStubImpl.execute(ServletStubImpl.java:175) 在 weblogic/servlet/internal/WebAppServletContext$ServletInitationAction.run(WebAppServletContext.java:3231) 在 weblogic/security/acl/internal/AuthenticatedSubject.doAs(AuthenticatedSubject.java:321) 在 weblogic/security/service/SecurityManager.runAs(SecurityManager.java:121) 在 weblogic/servlet/internal/WebAppServletContext.securedExecute(WebAppServletContext.java:2002) 在 weblogic/servlet/internal/WebAppServletContext.execute(WebAppServletContext.java:1908) 在 weblogic/servlet/internal/ServletRequestImpl.run(ServletRequestImpl.java:1362) 在 weblogic/work/ExecuteThread.execute(ExecuteThread.java:209) 在 weblogic/work/ExecuteThread.run(ExecuteThread.java:181) 在 jrockit/vm/RNI.c2java(JJJJJ)V(本机方法) -- 跟踪结束
我们想做几件事:
1.) 防止产生重复的 JVM,因为在执行简单的 jfmerge 可执行文件时我们不需要它的任何函数,而且它会产生巨大的开销。
2.) 在短期内至少要防止这个重复的 JVM 无限期地处理。
All,
We are running a J2EE application on WebLogic server 9.2 MP2 with a jrockit 64-bit JVM (27.3.1) on Solaris 10.
We call use runtime.exec to call an executable called jfmerge to create PDF documents.
We have found that in Solaris, when runtime.exec is called, a duplicate JVM is temporarily spawned to kick off the jfmerge process. While this is inefficient (our JVM is 5 GB, thus the duplicated shell JVM is also 5 GB), the major problem lies in the fact that when there is heavy load on this functionality (PDF generation) in our application, sometimes the duplicated JVM never exits.
When the JVM hangs, the servers create large issues (extreme application slowness and terminated user sessions) as the entire duplicate JVM get's all of its 5 GB of process size written to disk swap.
We have noted the following hung thread correlated with a hung JVM process until the process is manually killed:
"[STUCK] ExecuteThread: '17' for queue: 'weblogic.kernel.Default
(self-tuning)'" id=3463 idx=0x158 tid=3460 prio=1 alive, in native,
daemon
at
jrockit/io/FileNativeIO.readBytesPinned(Ljava/io/FileDescriptor;[BII)I(Native
Method)
at jrockit/io/FileNativeIO.readBytes(FileNativeIO.java:30)
at java/io/FileInputStream.readBytes([BII)I(FileInputStream.java)
at java/io/FileInputStream.read(FileInputStream.java:194)
at
java/lang/UNIXProcess$DeferredCloseInputStream.read(UNIXProcess.java:227)
at java/io/BufferedInputStream.fill(BufferedInputStream.java:218)
at java/io/BufferedInputStream.read(BufferedInputStream.java:235)
^-- Holding lock:
java/io/BufferedInputStream@0xfffffffec6510470[thin lock]
at
gov/v3/common/formgeneration/sessionbean/FormsBean.getProcessStatus(FormsBean.java:809)
at
gov/v3/common/formgeneration/sessionbean/FormsBean.createPDF(FormsBean.java:750)
at
gov/v3/common/formgeneration/sessionbean/FormsBean.getTemplateDetails(FormsBean.java:450)
at
gov/v3/common/formgeneration/sessionbean/FormsBean.generateSinglePDF(FormsBean.java:1371)
at
gov/v3/common/formgeneration/sessionbean/FormsBean.generatePDF(FormsBean.java:263)
at
gov/v3/common/formgeneration/sessionbean/FormsBean.endorseDocument(FormsBean.java:2377)
at
gov/v3/common/formgeneration/sessionbean/Forms_qaco28_EOImpl.endorseDocument(Forms_qaco28_EOImpl.java:214)
at
gov/v3/delegates/common/FormsAndNoticesDelegate.endorseDocument(FormsAndNoticesDelegate.java:128)
at
gov/v3/actions/common/EndorseDocumentAction.executeRequest(EndorseDocumentAction.java:68)
at
gov/v3/fwk/controller/struts/action/V3CommonDispatchAction.dispatchToExecuteMethod(V3CommonDispatchAction.java:532)
at
gov/v3/fwk/controller/struts/action/V3CommonDispatchAction.executeBaseAction(V3CommonDispatchAction.java:336)
at
gov/v3/fwk/controller/struts/action/V3BaseDispatchAction.execute(V3BaseDispatchAction.java:69)
at
org/apache/struts/action/RequestProcessor.processActionPerform(RequestProcessor.java:484)
at
gov/v3/fwk/controller/struts/requestprocessor/V3TilesRequestProcessor.processActionPerform(V3TilesRequestProcessor.java:384)
at
org/apache/struts/action/RequestProcessor.process(RequestProcessor.java:274)
at
org/apache/struts/action/ActionServlet.process(ActionServlet.java:1482)
at
org/apache/struts/action/ActionServlet.doGet(ActionServlet.java:507)
at
gov/v3/fwk/controller/struts/servlet/V3ControllerServlet.doGet(V3ControllerServlet.java:110)
at javax/servlet/http/HttpServlet.service(HttpServlet.java:743)
at javax/servlet/http/HttpServlet.service(HttpServlet.java:856)
at
weblogic/servlet/internal/StubSecurityHelper$ServletServiceAction.run(StubSecurityHelper.java:227)
at
weblogic/servlet/internal/StubSecurityHelper.invokeServlet(StubSecurityHelper.java:125)
at
weblogic/servlet/internal/ServletStubImpl.execute(ServletStubImpl.java:283)
at
weblogic/servlet/internal/ServletStubImpl.execute(ServletStubImpl.java:175)
at
weblogic/servlet/internal/WebAppServletContext$ServletInvocationAction.run(WebAppServletContext.java:3231)
at
weblogic/security/acl/internal/AuthenticatedSubject.doAs(AuthenticatedSubject.java:321)
at
weblogic/security/service/SecurityManager.runAs(SecurityManager.java:121)
at
weblogic/servlet/internal/WebAppServletContext.securedExecute(WebAppServletContext.java:2002)
at
weblogic/servlet/internal/WebAppServletContext.execute(WebAppServletContext.java:1908)
at
weblogic/servlet/internal/ServletRequestImpl.run(ServletRequestImpl.java:1362)
at weblogic/work/ExecuteThread.execute(ExecuteThread.java:209)
at weblogic/work/ExecuteThread.run(ExecuteThread.java:181)
at jrockit/vm/RNI.c2java(JJJJJ)V(Native Method)
-- end of trace
We would like to do a couple of things:
1.) Prevent the spawning of a duplicate JVM, as we do not need any of it's functions when executing the simple jfmerge executable, and it creates massive overhead.
2.) In the short term at least prevent this duplicate JVM from handing indefinitely.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
这个答案来晚了,但我们也有同样的问题,而我们的问题是 Solaris 如何管理内存。
问题是,当我们有一个应用程序服务器时,在我的例子中使用大量内存 10GB,并且我们想要运行一个简单的“ls”,新进程需要 10GB 才能运行。
Solaris 需要我们的服务器中有 10GB 的额外可用空间,Linux 使用称为“写时复制”的功能,此功能减少了分叉新进程的开销
http://developers.sun.com/solaris/articles/subprocess/subprocess.html
历史背景和问题描述
传统上,Unix 只有一种方法创建新进程:使用 fork() 系统调用,通常后跟 exec() 系统调用。 fork() 调用复制整个父进程的地址空间,而 exec() 将该副本转换为新进程。
(注意:在 Solaris 操作系统中,术语交换空间用于描述为系统配置的物理内存和磁盘交换空间的组合。但是,对于其他 Unix 系统,该术语可能意味着磁盘上的交换空间,也称为后备存储为了避免任何混淆,我将使用术语“虚拟内存”(VM) 来表示物理内存加磁盘交换空间。)
一般来说,fork/exec 方法效果很好。 然而,它在某些情况下也有缺点,例如无缘无故地耗尽内存以及分叉性能不佳。
内存不足:对于大内存进程,fork() 系统调用可能会因 VM 数量不足而失败,因为 fork() 需要两倍于父内存的内存量。 即使 fork() 紧随其后的是 exec() 调用(这将释放大部分额外内存),这种情况也可能发生。 发生这种情况时,应用程序通常会终止。
例如,假设 64 位应用程序当前消耗 6 GB 的 VM,并且需要创建一个子进程来运行 ls(1) 命令。 父进程发出 fork() 调用,仅当此时还有另外 6 GB 的 VM 可用时,该调用才会成功。 如果系统没有那么多可用的 VM(这是一种常见情况),fork() 将失败并出现 ENOMEM。 显然,ls(1) 命令不需要接近 6 GB 的内存来运行,但 fork() 不知道这一点。
不仅应用程序,Sun 自己的工具也可能遇到同样的问题。 例如,已针对 dbx 提交以下 Sun RFE(增强请求):“4748951 dbx shell 应使用 posix_spawn() 执行非内置命令,而不是 fork(2)”。
当客户的实用程序使用脚本调用 dbx 来读取巨大的核心文件时,RFE 4748951 就出现了,该脚本还需要从 dbx 中运行 cut(1) 命令。 他们收到无法分叉 - 重试错误消息,导致 dbx 中止。 一项调查显示,dbx 使用 fork/exec 来执行微小的 cut(1) 命令,并在 fork() 调用期间耗尽了 VM。
Solaris Java 虚拟机 (JVM) 目前也遇到同样的问题,如 Sun RFE 中所述:“5049299 在 S10 上使用 posix_spawn,而不是 fork,以避免交换耗尽”。
所以你有3个选择。
1.- 提前执行 Runtime.exec 函数。
2.- 创建与其他java服务器的进程间通信,并在那里执行Runtime.exec指令。
3.- 创建一个 JNI 类来调用系统 C 函数。 我选择了这个选项,效果非常好。
我把我的示例代码放在这里。
Java 代码。
C 头代码。 这是使用 javah -jni CallOS
C 代码生成的。
我希望这对你有帮助。
This answer is late, but we have the same problem, and the problem for us is how Solaris manage the memory.
The problem is when we have a application server, using a lot of memory 10GB in my case, and we want to run a simple "ls", the new process needs 10GB to run.
Solaris needs the 10GB extra available in our server, Linux use a feature known as “copy-on-write” This feature reduces the overhead of forking a new process
http://developers.sun.com/solaris/articles/subprocess/subprocess.html
Historical Background and Problem Description
Traditionally, Unix has had only one way to create a new process: using a fork() system call, often followed by an exec() system call. The fork() call makes a copy of the entire parent process' address space, and exec() turns that copy into a new process.
(Note: In the Solaris OS, the term swap space is used to describe a combination of physical memory and disk swap space configured for the system. However, with other Unix systems this term may mean swap space on disk, also known as backing store. To avoid any confusion, I'll use the term Virtual Memory (VM) to mean physical memory plus disk swap space.)
Generally, the fork/exec method has worked quite well. However, it has disadvantages in some cases, such as running out of memory without a good reason and poor fork performance.
Out of Memory: For a large-memory process, the fork() system call can fail due to an inadequate amount of VM, because fork() requires twice the amount of the parent memory. This can happen even when fork() is immediately followed by an exec() call that would release most of that extra memory. When this happens, the application will usually terminate.
For example, suppose a 64-bit application is consuming 6 gigabytes (Gbytes) of VM at the moment, and it needs to create a subprocess to run the ls(1) command. The parent process issues a fork() call that will succeed only if there is another 6 Gbytes of VM available at the moment. If the system doesn't have that much VM available (which is a frequent situation), fork() will fail with ENOMEM. Obviously, the ls(1) command doesn't need anywhere near 6 Gbytes of memory to run, but fork() doesn't know that.
Not only applications, but also Sun's own tools can suffer from the same problem. For example, the following Sun RFE (request for enhancement) has been filed for dbx: "4748951 dbx shell should use posix_spawn() for non-builtin commands rather than fork(2)".
RFE 4748951 came about when a customer's utility invoked dbx to read a huge core file using a script that also needed to run a cut(1) command from within dbx. They got a cannot fork - try again error message causing dbx to abort. An investigation revealed that dbx used fork/exec to execute that tiny cut(1) command and ran out of VM during the fork() call.
The Solaris Java Virtual Machine (JVM) is also suffering from the same problem currently, as described in this Sun RFE: "5049299 Use posix_spawn, not fork, on S10 to avoid swap exhaustion".
So you have 3 options.
1.- Execute the Runtime.exec function earlier.
2.- Create a inter process comunication with other java server, and ececute there the Runtime.exec instruccion.
3.- Create a JNI class to call a system C function. I take this option, and it work perfect.
I put my sample code here.
Java Code.
C header Code. This is generate with javah -jni CallOS
C code.
I hope this help for you.
您是否正确处理生成的进程
stdout
/stderr
? 您需要在单独的线程中使用两者以可靠地防止阻塞。 有关详细信息,请参阅此答案。 您的进程生成对于某些作业和其他作业可能正常工作(由于触发挂起的 stdout/err 数量)。关于重复进程的问题,我希望 JVM 能够
fork
/exec
。 这会复制 Java 进程 (fork
),然后用新进程 (exec
) 替换它。 我想知道你看到的是不是这个? 另请注意,我希望操作系统实现 COW(写时复制)以仅复制那些图像之间不同的内存页面,因此在正常情况下,JVM 不会进行复制消耗的内存与您想象的一样多。Are you handling the spawned process
stdout
/stderr
properly ? You need to consume both in separate threads to reliably prevent blocking. See this answer for details. It may be the case that your process spawning works properly for some jobs, and for others (due to the quantity of stdout/err that triggers a hang).On the subject of duplicate processes, I would expect the JVM to
fork
/exec
. This duplicates the Java process (fork
) and then it should replace it with the new process (exec
). I wonder if that's what you're seeing ? Note also that I'd expect the OS to be implementing COW (copy-on-write) to duplicate only those memory pages that differ between images, so in normal circumstances the duplication of the JVM wouldn't consume as much memory as you may think.正如 Brian 所暗示的,在 Unix 上,另一个进程启动另一个程序的标准方法是 fork 成一个父进程和一个子进程。 然后子进程调用 exec 以用新程序替换自身。 JVM 必须执行此操作才能启动 jfmerge 程序。
通常,子进程的内存大小不是问题,因为操作系统使用写时复制让两个进程共享相同的内存映像,直到子进程调用 exec。 可能是 JVM 的子进程模型要求它分叉两次,孙进程执行 jfmerge,子进程管理孙进程。 这可以解释为什么您会看到重复的 JVM 进程。 堆栈跟踪显示进程阻止从输入流读取。 jfmerge 可能运行缓慢,进程只是挂起等待 jfmerge 产生一些输出。
您可以做的是使用其他进程来启动 jfmerge,而不是您的 5GB JVM。 编写一个仅按需运行 jfmerge 的独立程序,并让它通过某种形式的进程间通信与主进程进行通信。 这个独立的 jfmerge 服务器不需要太多内存来运行,因此分叉子进程的影响不会那么大。
As Brian implied, on unix, the standard way for another process to start another program is to fork into a parent process and a child process. The child process then calls exec to replace itself with the new program. The JVM has to do this to start your jfmerge program.
Normally, the memory size of the child process isn't an issue, because the OS uses copy-on-write to let the two processes share the same memory image until the child calls exec. It could be that the JVM's model for child processes requires it to fork twice, with the grandchild exec'ing jfmerge and the child process that manages the grandchild. That would explain why you see a duplicate JVM process that you are seeing. The stack trace shows a process blocked reading from an input stream. It may be that jfmerge is running slowly and the process is just hung waiting for jfmerge to produce some output.
What you could do is to get some other process to launch jfmerge, instead of your 5GB JVM. Write a standalone program which just runs jfmerge on demand, and have it communicate with the main process through some form of inter-process communication. This standalone jfmerge server wouldn't require as much memory to operate, so the impact of forked child processes wouldn't be so great.