使用 openoffice headless 的 Docx 到 pdf 方式太慢

发布于 2024-10-28 05:09:58 字数 1289 浏览 6 评论 0原文

我一直在使用 PHPWord 生成 docx 文件。而且效果一直很好。但现在我还需要以 pdf 版本提供其中一些文件。

经过一番研究，我发现 PyODConverter 使用 OOo。似乎是一个不错的选择，因为我不想依赖第三方网络服务。我在我的机器上尝试了一下，效果很好，所以我也在我的服务器上应用了它。虽然花了一点时间，但我也设法让它在那里工作。

然而有一个（坏）问题。在服务器上大约需要 21 秒才能完成，而在我的机器上则不会超过 2 秒。:( 这对于我的需求来说太长了，所以我一直在试图找出可能导致这种延迟的原因。在healess模式下启动openoffice并创建套接字是可以的。所以我一直在查看 python 脚本，试图找出哪个指令可能导致速度变慢。我将其范围缩小到这一行：

context = resolver.resolve("uno:socket,host=127.0.0.1,port=8100;urp;StarOffice.ComponentContext")

这是需要大约 20 秒才能执行的操作。插入的代码：

localContext = uno.getComponentContext()
resolver = localContext.ServiceManager.createInstanceWithContext("com.sun.star.bridge.UnoUrlResolver", localContext)
try:
    context = resolver.resolve("uno:socket,host=127.0.0.1,port=8100;urp;StarOffice.ComponentContext")
except NoConnectException:
    raise DocumentConversionException, "failed to connect to OpenOffice.org on port %s" % port
self.desktop = context.ServiceManager.createInstanceWithContext("com.sun.star.frame.Desktop", context)

有什么线索可能导致这种延迟吗？我已经排除了我尝试转换的文档，因为此操作发生在此之前。难道是“uno”的问题吗？或者也许另一个缺失的库可能会在resolve()操作期间导致无用的测试？

欢迎任何想法。 :)

最好的问候，焦躁不安

原文

I've been using PHPWord for docx files generation. And it's been working great.
But now I have the need to also make available some of those files on a pdf version.

After a few research I found PyODConverter which use OOo. Seemed quite a good option since I don't want to depend on third party web services. I tried it out on my machine and it works fined, so I've applied it on my server as well. It took a little longer but I've managed to get it working on there too.

There is however an (bad) issue. On the server this takes about 21 seconds to get it done, while on my machine it doesn't take longer than 2. :(
This is way too much time for my needs so I've been trying to spot what might be causing this delay.
Starting openoffice in healess mode with socket creation is okay.
So I've been looking at the python script trying to find out which instruction might be causing to slow down. I've narrowed it down to this line:

context = resolver.resolve("uno:socket,host=127.0.0.1,port=8100;urp;StarOffice.ComponentContext")

This is the action that's taking about 20secs to execute.
The code where it is inserted:

localContext = uno.getComponentContext()
resolver = localContext.ServiceManager.createInstanceWithContext("com.sun.star.bridge.UnoUrlResolver", localContext)
try:
    context = resolver.resolve("uno:socket,host=127.0.0.1,port=8100;urp;StarOffice.ComponentContext")
except NoConnectException:
    raise DocumentConversionException, "failed to connect to OpenOffice.org on port %s" % port
self.desktop = context.ServiceManager.createInstanceWithContext("com.sun.star.frame.Desktop", context)

Any clues on what might be causing this delay?
I've ruled out the document that I'm trying to convert since this operations occur before that.
Could it be a problem with 'uno'? Or maybe another missing library that might be causing useless testing on during the resolve() operation?

Any ideas are welcome. :)

Best regards, Restless

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

可爱暴击 2024-11-04 05:09:58

我设法通过使用管道而不是套接字进行连接来消除延迟。

context = resolver.resolve("uno:pipe,name=myuser_OOffice;urp;StarOffice.ComponentContext")

不过，我仍然有一个问题...执行 python 脚本的用户必须与启动 OOo 的用户相同才能正常工作。通常这不会是一个大问题，但我试图从我的网络应用程序执行 python，但我仍然无法让它工作。
我正在尝试这样的事情：

exec('sudo -u#1000 -s python path/to/DocumentConverter.py filename.docx filename.pdf');

我没有从中得到任何东西......而且我不明白为什么。也许运行 exec() 的用户（www-data）没有执行 sudo 的权限？

I manage to eliminate the delay by using pipes instead of sockets for the connection.

context = resolver.resolve("uno:pipe,name=myuser_OOffice;urp;StarOffice.ComponentContext")

I still have one problem though... the user executing the python script must be the same that starts OOo for everything to work okay. Usually it would not be much of an issue, but I'm trying to execute python from my web application and I still didn't manage to get it working.
I'm trying with something like this:

exec('sudo -u#1000 -s python path/to/DocumentConverter.py filename.docx filename.pdf');

I'm getting nothing from this.. and I don't get why. Maybe the user (www-data) running exec() does not have permission to execute sudo??

回复收藏 0 原文

倚栏听风 2024-11-04 05:09:58

也许服务器上的名称解析器不知道localhost（这会很奇怪，但 20 秒听起来确实像是 DNS 超时）。您可以尝试将其替换为 127.0.0.1。

或者，也许它的查找工作正常，为 localhost 获取 IPv6 和 IPv4 地址，尝试通过 IPv6 建立连接并失败（即该组件可能不支持 IPv6，或者不绑定到默认情况下该接口），然后才回退到 IPv4。在这种情况下，补救措施是相同的：将 localhost 替换为 127.0.0.1。

回复收藏 0 原文

小鸟爱天空丶 2024-11-04 05:09:58

可惜openoffice太重了。我也在考虑它，但后来我找到了更轻的解决方案，那就是 abiword。

我必须从上传的文档生成 4 个首页的预览。这就是我所做的：

abiword document.doc --to=ps --exp-props="pages:1-4"
gs -q -dNOPAUSE -dBATCH -dTextAlphaBits=4  -dGraphicsAlphaBits=4 -r72 -sDEVICE=pnggray -sOutputFile=preview%d.png document.ps

所以你可能会得到最近的 abiword 并尝试这样的事情：

abiword document.docx --to=pdf

Its a pity that openoffice is so heavy. I was also considering it, but then I found lighter solution that is abiword.

I had to generate the previews of 4 first pages from uploaded document. This is what I did:

abiword document.doc --to=ps --exp-props="pages:1-4"
gs -q -dNOPAUSE -dBATCH -dTextAlphaBits=4  -dGraphicsAlphaBits=4 -r72 -sDEVICE=pnggray -sOutputFile=preview%d.png document.ps

So you may get the recent abiword and try something like this: