自动将大量 Word 文档转换为 xml,修改它们,然后将其转换为 Latex、pdf、html
Word 中有一组大约 400 个文档,它们是质量管理系统的一部分,这让我很伤心,因为 a) 它处理大文档中的图像很差 b) 布局有时会被破坏 c) 配置为不同的客户提供文档。
我可以通过将单个文档保存为 xml/html 或文本来转换它们,然后手动将它们转换为 Latex,但这对于 400 个文档是不可能的。 我知道我可以使用PrimoPDF等工具将word文档直接打印为pdf,但这不够灵活,因为我需要修改内容。
有没有办法保留文档的结构(如纯文本、标题、表格、图像)并将其转换为 XML? 然后我想根据客户的选择将XML转换为html、latex和pdf并修改内容? xslt 是将 xml 转换为其他格式的一种方法吗?
感谢您的任何建议。
Having a set of about 400 Documents in word which are part of a Quality Management System Word is causing me a lot of grieve because a) it handles images in large doc poorly b) the layout gets sometimes busted c) it is cumbersome to configure the documentation for different clients.
I can convert single documents by saving them as xml/html or text and convert them manually into latex but that is not possible for 400 documents. I know that i can print word documents directly to pdf with tools like PrimoPDF but that is not flexible enough because i need to modify the content.
Is there a way to keep the structure of the document like plain text, headings, tables, images and transform it into XML? Afterwards i would like to transform the XML into html, latex and pdf according the choices of our clients and also modify the content? Is xslt a way to go for transforming the xml to the other formats?
Thanks for any advice.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
您可以将文档转换为 Word 2007。Office 2007 文档是 XML 文档:只需将文件扩展名更改为
.zip
并进行压缩即可。 此外,Microsoft 还发布了一个用于处理 Office 2007 文档的 API,该 API 的级别高于处理 XML 标记的级别。You could convert your documents to Word 2007. Office 2007 documents are XML documents: just change the file extension to
.zip
and upzip. Also, Microsoft publishes an API for working with Office 2007 documents that is higher-level than working with the XML tags.要将 MS Word 批量转换为其他内容,您可以查看 OpenOffice.org。
OpenOffice 具有用于批量转换的(命令行)批处理模式。 您还可以查看 JodConverter,它仅使用该机制转换文档。
这样您就可以将 Microsoft Word 批量转换为 OpenOffice.org 支持的其他格式。 也许是文本,也许是 RTF,也许是 OpenOffice XML。
然后你就有了一个更容易转换为 Latex 的格式。
在 Stack Overflow 上搜索 Word 和 OpenOffice,您会发现类似 这个关于 Word 到 Html 转换的文章。
For batch converting MS Word to something else you might have a look at OpenOffice.org.
OpenOffice has a (command line) batch mode for mass conversions. You can also have a look at JodConverter which converts documents using just that mechanism.
That way you could mass convert Micrososoft Word to some other format OpenOffice.org supports. Perhaps text, perhaps RTF, perhaps OpenOffice XML.
You then have a hopefully easier format to convert to Latex.
Have a search for Word and OpenOffice right here at Stack Overflow, you'll find results like this one about Word to Html conversion.
有关于Word的建议<--> TUG(TeX 用户组)的 LaTeX 转换:
http://www.tug.org/utilities/texconv/pctotex.html
可能值得拥有查看一下是否有任何建议和方法满足您的要求。
There is advice on Word <--> LaTeX conversions at TUG (TeX User Group):
http://www.tug.org/utilities/texconv/pctotex.html
that may be worth having a look at to see if any of the suggestions and methods meet your requirements.
不确定它的效果如何,但有 Word2tex。
Not sure how well it works, but there is Word2tex.