在 PHP 中读取 docx (Office Open XML)
我想在我们的CMS中添加一个word导入功能,唯一的问题是我找不到一个好的库来读取docx文件(Word 2007)。
有人有一些建议吗,图书馆应该能够提取文档的内容和基本样式,如斜体、粗体、上标?
感谢您的帮助
I want to add an word import function to our CMS, the only problem I cannot seems to find a good library for reading docx files (Word 2007).
Do anyone has some recommendations, the library should be able to extract content of the document and basic styling like italic, bold, superscript?
Thanks for your help
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(7)
docx
文件实际上只是文档 XML 的容器。您应该能够解压 docx 文件,然后转到其中的 word 文件夹,然后转到 document.xml。这有实际的文字。但是诸如字体和样式之类的内容位于 docx 容器中的其他 xml 文件中,因此您可能需要稍微搞一下,弄清楚什么是什么以及如何将其匹配(我敢打赌,从使用命名空间开始)。但是,是的,解压缩该文件,然后使用 simplexml 将其转换为您实际上可以使用的东西。
docx
files are actually just containers for the document's XML. You should be able to unzip the docx file and then go to the word folder inside, then to the document.xml. This has the actual text. But things like the fonts and styles are in other xml files in the docx container, so you'll probably want to mess around a bit and figure out what is what and how to match it up (start by using namespaces, I bet).But yea, unzip the file, then use simplexml to convert it into something you can actually mess around with.
PHPDocX PRO 包含一个 TransformDoc 类,可以读取 .docx (zip) 文件并从中生成 XHTML(或 PDF) :
PHPDocX PRO includes a TransformDoc class that can read .docx (zip) files and generate XHTML (or PDF) from it:
有一个库可以做到这一点,但它与Zend框架一起工作,可能会对你有所帮助
它称为 phpLiveDocx : http://www.phplivedocx.org/downloads/< /a>
该库已获得 New Bcd 许可
There is a library to do this but it works with Zend framework may be it will help you
It is called phpLiveDocx : http://www.phplivedocx.org/downloads/
The library is licensed under New Bcd
我刚刚找到一个具有读写支持的库,请在 codeplex forge 上检查它 http://openxmlapi.codeplex.com 并根据 GPLv2 获得许可。
I have just find a library that has both reading and writing support check it on the codeplex forge http://openxmlapi.codeplex.com and it is licensed under GPLv2 .
或者,由于您请求了一个库,您可能需要查看诸如 Docvert 之类的内容。我只是根据你的问题四处寻找,这是迄今为止我最喜欢的 PHP 。您输入单词文件位置,它会将其转换为带有属性和所有好东西的简单内容。
Or, since you requested a library, you may want to look into something like Docvert. I was just looking around based on your question, and it's my favorite so far for PHP. You input the word file location, it transforms it into something simple with the attributes and all that good stuff.
使用 将 docx 文档转换为 odt OpenOffice。然后使用 eZ Components 进行解析和导入。他们实际上在 CMZ eZ Publish 中使用导入。
Convert a docx document to a odt using OpenOffice. Use then eZ Components to do the parsing and import. They actually use the import in their CMZ eZ Publish.
这是我找到的一个简单的工作解决方案
http://webcheatsheet.com/php/reading_the_clean_text_from_docx_odt.php
Here is a simple working solution I found
http://webcheatsheet.com/php/reading_the_clean_text_from_docx_odt.php