从 Runtime.getRuntime().exec() 启动 wkhtmltopdf：永远不会终止？

发布于 2024-10-28 06:29:09 字数 1778 浏览 10 评论 0原文

我正在从我的 Java 应用程序（Tomcat 服务器的一部分，在 Win7 64 位上的 Eclipse Helios 中以调试模式运行）启动 wkhtmltopdf：我想等待它完成，然后执行更多操作。

String cmd[] = {"wkhtmltopdf", htmlPathIn, pdfPathOut};
Process proc = Runtime.getRuntime().exec( cmd, null );

proc.waitFor();

但 waitFor() 永远不会返回。我仍然可以在 Windows 任务管理器中看到该进程（使用我传递给 exec() 的命令行：看起来不错）。它有效。 wkhtmltopdf 生成我期望的 PDF，就在我期望的地方。我可以打开它，重命名它，等等，即使进程仍在运行（在我手动终止它之前）。

从命令行来看，一切都很好：

c:\wrk>wkhtmltopdf C:\Temp\foo.html c:\wrk\foo.pdf
Loading pages (1/6)
Counting pages (2/6)
Resolving links (4/6)
Loading headers and footers (5/6)
Printing pages (6/6)
Done

进程退出得很好，生活还在继续。

那么，runtime.exec() 是什么原因导致 wkhtmltopdf 永远不会终止呢？

我可以抓住 proc.getInputStream() 并寻找“Done”，但那是......卑鄙的。我想要更通用的东西。

我在有或没有工作目录的情况下调用 exec() 。我尝试过使用和不使用空的“env”数组。没有喜悦。

为什么我的进程挂起，我该如何修复它？

PS：我已经尝试过使用其他几个命令行应用程序，它们都表现出相同的行为。

进一步的执行困境。

我正在尝试读取标准输出 &错误，没有成功。从命令行，我知道应该有一些与我的命令行体验非常相似的东西，但是当我读取 proc.getInputStream() 返回的输入流时，我立即得到一个 EOL （-1，我正在使用 inputStream.read())。

我检查了 Process 的 JavaDoc，发现了这个

父进程使用这些流向子进程提供输入并从子进程获取输出。由于某些原生平台只为标准输入输出流提供有限的缓冲区大小，因此未能及时写入子进程的输入流或读取子进程的输出流可能会导致[b]子进程阻塞，甚至死锁[/b]。< /p>

添加了强调。所以我尝试了。标准输出输入流上的第一个“read()”被阻止，直到我终止了该进程...

WITH WKHTMLTOPDF

使用通用命令行 ap &没有参数，所以它应该“转储使用情况并终止”，它吸出适当的 std::out，然后终止。

有趣的！

JVM版本问题？我使用的是1.6.0_23。最新的是... v24。我刚刚检查了更改日志，没有看到任何有希望的内容，但无论如何我都会尝试更新。

好的。不要让输入流填满，否则它们会阻塞。查看。 .close() 也可以防止这种情况，但不是很聪明。

这通常是有效的（包括我测试过的通用命令行应用程序）。

然而，具体来说，它会崩溃。看起来 wkhtmltopdf 正在使用一些终端操作/光标的东西来制作 ASCII 图形进度条。我相信这导致 inputStream 立即返回 EOF 而不是给我正确的值。

有什么想法吗？几乎不会破坏交易，但拥有它绝对是件好事。

原文

I'm launching wkhtmltopdf from within my Java app (part of a Tomcat server, running in debug mode within Eclipse Helios on Win7 64-bit): I'd like to wait for it to complete, then Do More Stuff.

String cmd[] = {"wkhtmltopdf", htmlPathIn, pdfPathOut};
Process proc = Runtime.getRuntime().exec( cmd, null );

proc.waitFor();

But waitFor() never returns. I can still see the process in the Windows Task Manager (with the command line I passed to exec(): looks fine). AND IT WORKS. wkhtmltopdf produces the PDF I'd expect, right where I'd expect it. I can open it, rename it, whatever, even while the process is still running (before I manually terminate it).

From the command line, everything is fine:

c:\wrk>wkhtmltopdf C:\Temp\foo.html c:\wrk\foo.pdf
Loading pages (1/6)
Counting pages (2/6)
Resolving links (4/6)
Loading headers and footers (5/6)
Printing pages (6/6)
Done

The process exits just fine, and life goes on.

So what is it about runtime.exec() that's causing wkhtmltopdf to never terminate?

I could grab proc.getInputStream() and look for "Done", but that's... vile. I want something that is more general.

I've calling exec() with and without a working directory. I've tried with and without an empty "env" array. No joy.

Why is my process hanging, and what can I do to fix it?

PS: I've tried this with a couple other command line apps, and they both exhibit the same behavior.

Further exec woes.

I'm trying to read standard out & error, without success. From the command line, I know there's supposed to be something remarkably like my command line experience, but when I read the input stream returned by proc.getInputStream(), I immediately get an EOL (-1, I'm using inputStream.read()).

I checked the JavaDoc for Process, and found this

The parent process uses these streams to feed input to and get output from the subprocess. Because some native platforms only provide limited buffer size for standard input and output streams, failure to promptly write the input stream or read the output stream of the subprocess may cause the [b]subprocess to block, and even deadlock[/b].

Emphasis added. So I tried that. The first 'read()' on the Standard Out inputStream blocked until I killed the process...

WITH WKHTMLTOPDF

With the generic command line ap & no params so it should "dump usage and terminate", it sucks out the appropriate std::out, then terminates.

Interesting!

JVM version issue? I'm using 1.6.0_23. The latest is... v24. I just checked the change log and don't see anything promising, but I'll try updating anyway.

Okay. Don't let the Input Streams fill or they'll block. Check. .close() can also prevent this, but isn't terribly bright.

That works in general (including the generic command line apps I've tested).

In specific however, it falls down. It appears that wkhtmltopdf is using some terminal manipulation/cursor stuff to do an ASCII-graphic progress bar. I believe this is causing the inputStream to immediately return EOF rather than giving me the correct values.

Any ideas? Hardly a deal-breaker, but it would definitely be Nice To Have.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

临风闻羌笛 2024-11-04 06:29:09

我和你有同样的问题，我解决了。以下是我的发现：

由于某种原因，wkhtmltopdf 的输出进入进程的 STDERR 而不是 STDOUT。我已经通过从 Java 和 perl 调用 wkhtmltopdf 来验证这一点

因此，例如在 java 中，您必须执行以下操作：

//ProcessBuilder is the recommended way of creating processes since Java 1.5 
//Runtime.getRuntime().exec() is deprecated. Do not use. 
ProcessBuilder pb = new ProcessBuilder("wkhtmltopdf.exe", htmlFilePath, pdfFilePath);
Process process = pb.start();

BufferedReader errStreamReader = new BufferedReader(new  InputStreamReader(process.getErrorStream())); 
//not "process.getInputStream()" 
String line = errStreamReader.readLine(); 
while(line != null) 
{ 
    System.out.println(line); //or whatever else
    line = reader.readLine(); 
}

顺便说一句，如果您从 java 生成一个进程，则必须从 stdout 和 stderr 流中读取（甚至如果你不做任何事情），否则流缓冲区将填满并且进程将挂起并且永远不会返回。

为了使您的代码面向未来，万一 wkhtmltopdf 的开发人员决定写入 stdout，您可以将子进程的 stderr 重定向到 stdout 并仅读取一个流，如下所示：

ProcessBuilder pb = new ProcessBuilder("wkhtmltopdf.exe", htmlFilePath, pdfFilePath); 
pb.redirectErrorStream(true); 
Process process = pb.start(); 
BufferedReader inStreamReader = new BufferedReader(new  InputStreamReader(process.getInputStream()));

实际上，我在必须生成的所有情况下都会这样做来自 java 的外部进程。这样我就不必读取两个流。

如果您不希望主线程阻塞，您还应该读取不同线程中生成的进程的流，因为从流中读取是阻塞的。

希望这有帮助。

更新：我在项目页面提出了此问题，并且回答说这是设计使然，因为 wkhtmltopdf 支持在 STDOUT 中给出实际的 pdf 输出。请参阅链接了解更多详细信息和 java 代码。

I had the same exact issue as you and I solved it. Here are my findings:

For some reason, the output from wkhtmltopdf goes to STDERR of the process and NOT STDOUT. I have verified this by calling wkhtmltopdf from Java as well as perl

So, for example in java, you would have to do:

//ProcessBuilder is the recommended way of creating processes since Java 1.5 
//Runtime.getRuntime().exec() is deprecated. Do not use. 
ProcessBuilder pb = new ProcessBuilder("wkhtmltopdf.exe", htmlFilePath, pdfFilePath);
Process process = pb.start();

BufferedReader errStreamReader = new BufferedReader(new  InputStreamReader(process.getErrorStream())); 
//not "process.getInputStream()" 
String line = errStreamReader.readLine(); 
while(line != null) 
{ 
    System.out.println(line); //or whatever else
    line = reader.readLine(); 
}

On a side note, if you spawn a process from java, you MUST read from the stdout and stderr streams (even if you do nothing with it) because otherwise the stream buffer will fill and the process will hang and never return.

To futureproof your code, just in case the devs of wkhtmltopdf decide to write to stdout, you can redirect stderr of the child process to stdout and read only one stream like this:

ProcessBuilder pb = new ProcessBuilder("wkhtmltopdf.exe", htmlFilePath, pdfFilePath); 
pb.redirectErrorStream(true); 
Process process = pb.start(); 
BufferedReader inStreamReader = new BufferedReader(new  InputStreamReader(process.getInputStream()));

Actually, I do this in all the cases where I have to spawn an external process from java. That way I don't have to read two streams.

You should also read the streams of the spawned process in different threads if you dont want your main thread to block, since reading from streams is blocking.

Hope this helps.

UPDATE: I raised this issue in the project page and was replied that this is by design because wkhtmltopdf supports giving the actual pdf output in STDOUT. Please see the link for more details and java code.

回复收藏 0 原文

您的好友蓝忘机已上羡 2024-11-04 06:29:09

一个进程有 3 个流：输入、输出和错误。您可以使用单独的进程同时读取输出和错误流。请参阅这个问题及其接受的答案和也是这个例如。

回复收藏 0 原文

人疚 2024-11-04 06:29:09

您应该从不同的流中读取线程。

回复收藏 0 原文

在巴黎塔顶看东京樱花 2024-11-04 06:29:09

    final Semaphore semaphore = new Semaphore(numOfThreads);
    final String whktmlExe = tmpwhktmlExePath;
    int doccount = 0;
    try{
        File fileObject = new File(inputDir);
        for(final File f : fileObject.listFiles()) {

            if(f.getAbsolutePath().endsWith(".html")) {
                doccount ++;
                if(doccount >500 ) {
                    LOG.info(" done with conversion of 1000 docs exiting ");
                    break;
                }
                System.out.println(" inside for before "+semaphore.availablePermits());
                semaphore.acquire();
                System.out.println(" inside for after "+semaphore.availablePermits() + " ---" +f.getName());
                new java.lang.Thread() {
                    public void run() {
                        try {
                            String F_ =  f.getName().replaceAll(".html", ".pdf") ;
                            ProcessBuilder pb = new ProcessBuilder(whktmlExe , f.getAbsolutePath(), outPutDir + F_ .replaceAll(" ", "_") );//"wkhtmltopdf.exe", htmlFilePath, pdfFilePath);
                            pb.redirectErrorStream(true);
                            Process process = pb.start();
                            BufferedReader errStreamReader = new BufferedReader(new  InputStreamReader(process.getInputStream()));  
                            String line = errStreamReader.readLine(); 
                            while(line != null) 
                            { 
                                System.err.println(line); //or whatever else
                                line = errStreamReader.readLine(); 
                            }

                            System.out.println("after completion for ");
                        } catch (Exception e) {
                            e.printStackTrace();
                        }finally {
                            System.out.println(" in finally releasing ");
                        semaphore.release();
                        }
                  }
                }.start();
            }
        }
    }catch (Exception ex) {
        LOG.error(" *** Error in pdf generation *** ", ex);
    }

    while (semaphore.availablePermits() < numOfThreads) {//till all threads finish 
        LOG.info( " Waiting for all threads to exit "+ semaphore.availablePermits() + " --- " +( numOfThreads - semaphore.availablePermits()));
        java.lang.Thread.sleep(10000);
    }

    final Semaphore semaphore = new Semaphore(numOfThreads);
    final String whktmlExe = tmpwhktmlExePath;
    int doccount = 0;
    try{
        File fileObject = new File(inputDir);
        for(final File f : fileObject.listFiles()) {

            if(f.getAbsolutePath().endsWith(".html")) {
                doccount ++;
                if(doccount >500 ) {
                    LOG.info(" done with conversion of 1000 docs exiting ");
                    break;
                }
                System.out.println(" inside for before "+semaphore.availablePermits());
                semaphore.acquire();
                System.out.println(" inside for after "+semaphore.availablePermits() + " ---" +f.getName());
                new java.lang.Thread() {
                    public void run() {
                        try {
                            String F_ =  f.getName().replaceAll(".html", ".pdf") ;
                            ProcessBuilder pb = new ProcessBuilder(whktmlExe , f.getAbsolutePath(), outPutDir + F_ .replaceAll(" ", "_") );//"wkhtmltopdf.exe", htmlFilePath, pdfFilePath);
                            pb.redirectErrorStream(true);
                            Process process = pb.start();
                            BufferedReader errStreamReader = new BufferedReader(new  InputStreamReader(process.getInputStream()));  
                            String line = errStreamReader.readLine(); 
                            while(line != null) 
                            { 
                                System.err.println(line); //or whatever else
                                line = errStreamReader.readLine(); 
                            }

                            System.out.println("after completion for ");
                        } catch (Exception e) {
                            e.printStackTrace();
                        }finally {
                            System.out.println(" in finally releasing ");
                        semaphore.release();
                        }
                  }
                }.start();
            }
        }
    }catch (Exception ex) {
        LOG.error(" *** Error in pdf generation *** ", ex);
    }

    while (semaphore.availablePermits() < numOfThreads) {//till all threads finish 
        LOG.info( " Waiting for all threads to exit "+ semaphore.availablePermits() + " --- " +( numOfThreads - semaphore.availablePermits()));
        java.lang.Thread.sleep(10000);
    }

回复收藏 0 原文

~没有更多了~