我的任务是找到一种将大量 .docx 文件转换为 docbook 5 的方法。目前,我们在 openoffice 中打开该文件并保存到 docbook。这是一项耗时的任务,但我相信有更好的方法。然后,这些文件将被进一步处理为我们的自定义relax NG 模式。因此,这种转换不需要完美无缺。我环顾四周,并将继续调查一些线索,但没有发现任何有用的东西。
查看 将 doc/docx 转换为语义 HTML 他们建议 upCast,但这似乎不适合我的需要。
我正在寻找可以从命令行使用的免费可用的东西。我最终想批量处理我们的文件。我已经包含了 linux、python 和 java 标签,因为这些是我最舒服的环境,但我愿意屈服于正确的解决方案。在我出去重新发明轮子之前,我试图做一些研究。
I have been tasked to find a way to convert a large amount of .docx files to docbook 5. Currently, we open the file in openoffice and save to docbook. This is a time consuming task, but I am confident there is a better way. These files will then be processed further to our custom relax NG schema. Therefore this conversion does not need to be flawless. I have looked around, and will continue to investigate some leads, but have not found anything usefull.
looking at Convert doc/docx to semantic HTML they have suggested upCast, but this does not seem appropriate to my needs.
I am looking for something freely available that I can use from the command line. I ultimately I would like to batch process our files. I have included the linux, python, and java tags for these are the environments I am most comfortable, but would be willing to bend for the right solution. I am trying to do some research before I go out and reinvent the wheel.
发布评论
评论(3)
冒着从 SX 获得考古学家徽章的风险,答案应包含对 Pandoc 的引用。这并不依赖于开放式办公室。
pandoc -f docx -t docbook -o newdocbook.dbk --独立原始.docx
At the risk of earning an archeologist's badge from SX, the answers should include a reference to Pandoc. This does not rely on open office.
pandoc -f docx -t docbook -o newdocbook.dbk --standalone original.docx
有多种方法可以编写此脚本,包括使用外部脚本和 OpenOffice 中的脚本。请参阅以下链接了解一些示例:
上面的一些链接未使用 Java 或 Python,但原则仍然适用,并且脚本通常很短足以让它们可以移植(第一个示例是用 Ruby 编写的,但由于简单,这是我个人最喜欢的)。
There are several ways to script this, both using external scripts and scripts within OpenOffice. See the following links for some examples:
Some of the above links aren't using Java or Python, but the principles still apply and the scripts are typically short enough that they can be ported (the first example is in Ruby, but it's my personal favorite due to the simplicity).
您可以在服务器模式下运行 openoffice 并将文档提供给它,而无需手动打开每个文档。
单程:
http://code.google.com/p/bungeni-editor/wiki/RunningTheJODConverterServer
You can run openoffice in server mode and feed the docs to it without having to manually open each on.
One way:
http://code.google.com/p/bungeni-editor/wiki/RunningTheJODConverterServer