MS Word 文档到 RTF 文档
我有一个问题:我的应用程序必须将 ms word 文档(从另一个系统导入)转换为 rtf 文档,以便使用 OOo API 进行操作并避免错误(由于编码不兼容的原因)。
请问:如何直接从 Java 应用程序操作 ms word 文档?有 API(如 POI 或 OOo)允许我在没有任何编码不兼容的情况下完成我的工作吗?
我的系统在 Linux 服务器计算机上运行(例如所有公共生产系统),并且我只安装了 OOo。
使用 OOo java API,我可以打开、操作和保存文档,但是,在最后一段时间里,我看到了很多关于 Ms Word 封闭编码和 OOo 打开文档格式编码之间编码不兼容的问题(我指的是给作家)。 在许多情况下,具有特定项目符号的列表(例如,“-”或嵌套列表)、页码(例如,x 格式中的 1)以及许多其他格式选项,输出文档(来自操作)会显示许多错误,原因如下:我认为,两种编码格式之间不兼容。
现在,我正在研究 Apache POI 功能,以便了解是否可以用它打开 Ms Word,并将文档保存为 RTF 格式(即交换格式),从而能够将不兼容性降低到最低程度。
你有同样的问题吗?你能给我推荐一个更强大的 POI 的 Java 开源库吗?或者,您能否建议我使用 POI+iText 等组合方法来执行 ms word 到 rtf 的转换步骤?
I've a problem: my application must convert ms word documents (imported from another system) into rtf documents, in order to be manipulated with OOo APIs and to be immune from mistakes (for coding incompatibility reasons).
I ask you: how can I manipulate ms word documents directly from my Java application? There are APIs (like POI or OOo) that allow me to do my work without any coding incompatibility?
My system runs on Linux server machines (such as all production systems for public) and I've installed only OOo.
Using the OOo java APIs I can open, manipulate and save the documents, but, in this last period I'm viewing a lot of problems concerning the incompatibility for coding between the Ms Word closed coding and the OOo opend document format coding (I refer to swriter).
In many cases, list with particular bullets (e.g., '-' or also nested list), page numbering (e.g., 1 of x format), and many others formatting options, the output document (from manipulation) shows many errors due to, I think, incompatibility between the two coding formats.
Now, I'm studying the Apache POI capabilities in order to understand if I can open Ms Word with it, and save the document in RTF format that is and interchange format able to reduce the incompatibility to minimal level.
Do you have a same problem? Can you indicate me a Java open source library more powefull of POI? Or, can you suggest me a combined approach such as POI+iText to do the conversion step ms word to rtf?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
当我被要求提供一种可靠地将文档转换为 tiff 的方法时,我做了一些研究。有许多库 - 无论是免费的还是商业的,都声称能够呈现 ms.docs。它们都不能提供 100% 准确的渲染。
我必须这样做的方法是在包装器中运行 MS Word,并通过 OLE 自动化操作它来完成我需要的操作。这(在后台运行 Word)本身有一些安静的问题,但通过深思熟虑的设计,您可以让它发挥作用。
你的情况比我的更容易,因为你所需要做的就是打开文档,然后将其另存为。
编辑
@Paolo - 就这样。我也经历过同样的事情——评估各种包,包括 OO,发现它们……不太精确。当然,这完全取决于您的客户对文档格式的严格程度。我的非常挑剔 - 对页边距大小和图片位置都非常挑剔。
另一种选择是给出(并获得批准)一份不精确的清单。不幸的是,对于每一个新文档,您都有机会遇到新文档
When I was asked to provide a way to reliably convert a doc to a tiff I did some research. There is a number of libraries out there - both free and commercial which claim to be able to render ms.docs. None of them provide 100% accurate rendering.
The way I had to do it is to run MS Word in a wrapper and manipulate it to do what I need through the OLE Automation. This (running Word in background) in itself has quiet a few gotchas but with thoughtful design you can make it work.
Your case is even easier than mine because all you need is to open the doc and then save it as.
Edit
@Paolo - There you go. I've been through the same - evaluating various packages, OO included and finding that they are mmmm... less than precise. Of course it all depends on how strict you customers are about document formatting. Mine were extremely picky - up to the margin sizes and picture positioning.
Another option would be to give (and get approval of) a list of imprecisions. Unfortunately with every new doc you will run a chance to hit a new one
Docvert 允许您设置 Web 服务以将 Word 文档转换为 Open Office 格式。但它会在 OLE 对象上出现问题。
Docvert lets you set up a web service to convert Word documents to Open Office format. It craps out on the OLE objects though.