We don’t allow questions seeking recommendations for software libraries, tutorials, tools, books, or other off-site resources. You can edit the question so it can be answered with facts and citations.
Closed 8 years ago.
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
接受
或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
发布评论
评论(4)
如果是纯粹的docx,可以试试phpdocx...不知道是读还是只写。 PHPWord 尚未读取,仅写入(尽管我正在研究它)。
如果您只需要属性信息,那么您将在 zip 内的 /docProps/core.xml 文件中找到所有内容(也可能在 /docProps/app.xml 中,具体取决于您需要哪些属性),因此您可以绕过大多数保存文本、样式、图像等的文件。为了验证文件名,[Content_Types].xml 将核心和应用程序属性文件的文件名保存为 application/vnd.openxmlformats-officedocument.spreadsheetml.sheet.main+ xml 和 application/vnd.openxmlformats-officedocument.extended-properties+xml
编辑:
如果您需要标题,那么您将需要解析文档,而不仅仅是属性。这意味着识别标题样式,并解析具有这些样式的实体的文本。
If it's purely docx, you can try phpdocx... don't know if it reads or only writes. PHPWord doesn't yet read, only writes (though I'm working on it).
If you only need the properties information, then you'll find it all within the /docProps/core.xml file within the zip (and possibly in /docProps/app.xml depending on exactly which properties you need), so you can bypass most of the files that hold text, style, images, etc. For verification of file names, [Content_Types].xml holds the filenames for the core and app properties files as application/vnd.openxmlformats-officedocument.spreadsheetml.sheet.main+xml and application/vnd.openxmlformats-officedocument.extended-properties+xml
EDIT:
If you need headings, then you will need to parse the document, not just the properties. That will mean identifying the heading styles, and parsing the text for entities with those styles.
Codeplex 有许多可以处理 MS Office 文档的库:
除了 PHPExcel 之外,我不知道这些项目的成熟程度如何。如果没有任何帮助,您仍然可以使用 DOM。
Codeplex has a number of libraries than can work with MS Office documents:
With the exception of PHPExcel, I do not know how mature those projects are. If there is nothing to help you out there, you can still use DOM.
OpenTBS 可以使用模板技术读取和修改 PHP 中的 DOCX(和其他 OpenXML 文件)文档。
不需要临时文件,不需要命令行,一切都在 PHP 中。
但如果您只需要读取 DOCX 文件的一部分,那么您可以使用类 TbsZip 。它可以读取 zip 存档(与任何 OpenXML 文件一样,DOCX 是主要包含 XML 文件的 zip 存档)。
在 DOCX 文件中,页眉和页脚子文件通常为“/word/header1.xml”和“/word/footer1.xml”。
仅当定义了页眉/页脚时它们才存在。
对于奇数页,还可能有一对可选的 XML 子文件(通常是“/word/header2.xml”和“/word/footer2.xml”)。
以及第一页的几个可选子文件(通常是“/word/header3.xml”和“/word/footer3.xml”)。
http://www.tinybutstrong.com/opentbs.php
OpenTBS can read and modify DOCX (and other OpenXML files) documents in PHP using the technique of templates.
No temporary files needed, no command lines, all in PHP.
But if you only need to read a part of the DOCX file, then you can use the class TbsZip. It can read zip archives (as any OpenXML files, DOCX is a zip archive containing mostly XML files).
In DOCX files, the headers and footers sub-files are usually "/word/header1.xml" and "/word/footer1.xml".
They exists only if header/footer is defined.
There also may have an optional couple of XML sub-files for odd numbered pages (usually "/word/header2.xml" and "/word/footer2.xml").
And an optional couple of sub-files for the first page (usually "/word/header3.xml" and "/word/footer3.xml").
http://www.tinybutstrong.com/opentbs.php
您还可以使用此库 https://poi.apache.org/
并通过 php java 桥连接它们 http://php-java-bridge.sourceforge.net/pjb /
- 安装tomcat服务器
- 将javabridge放在webapps文件夹中并添加poi库
- 然后你可以使用这个库来提取标题样式。
API 有详细记录,您有很多选择。
如果有一个 PHP 库可以做到这一点会更好,但是如果这种方法适合您或其他人,您可以尝试一下
You could also use this libraries https://poi.apache.org/
and connect them through php java bridge http://php-java-bridge.sourceforge.net/pjb/
- install a tomcat server
- place java bridge in the webapps folder and add the poi libraries
- then you could use this libraries to extract the heading styles.
The API is well documented and you have many options.
A PHP library that does this would be the better, but you can try this approach if it works for you or somebody else