如何在 ASP.NET 应用程序中查询 Word docx?
我想将 Word 2007 或更高版本的 docx 文件上传到我的 Web 服务器,并将目录转换为简单的 xml 结构。使用传统 VBA 在桌面上执行此操作似乎很容易。查看用于创建 docx 文件的 WordprocessingML XML 数据会令人困惑。有没有一种方法(无需 COM)以更多面向对象的方式导航文档?
I would like to upload a Word 2007 or greater docx file to my web server and convert the table of contents to a simple xml structure. Doing this on the desktop with traditional VBA seems like it would have been easy. Looking at the WordprocessingML XML data used to create the docx file is confusing. Is there a way (without COM) to navigate the document in more of an object-oriented fashion?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
我强烈建议您查看 打开 XML SDK 2.0。它是一个 CTP,但我发现它在操作 xmlx 文件方面非常有用,而无需处理 COM。该文档有点粗略,但要查找的关键是 DocumentFormat.OpenXml.Packaging.WordprocessingDocument 类。如果将扩展名重命名为 .zip 并深入研究其中的 XML 文件,则可以拆开 .docx 文档。通过这样做,看起来目录包含在“结构化文档”标签中,并且标题之类的内容位于那里的超链接中。经过一番研究,我发现类似的东西应该有效(或者至少给你一个起点)。
I highly recommend looking into the Open XML SDK 2.0. It's a CTP, but I've found it extremely useful in manipulating xmlx files without having to deal with COM at all. The documentation is a bit sketchy, but the key thing to look for is the DocumentFormat.OpenXml.Packaging.WordprocessingDocument class. You can pick apart the .docx document if you rename the extension to .zip and dig into the XML files there. From doing that, it looks like a Table of Contents is contained in a "Structured Document" tag and that things like the headings are in a hyperlink from there. Putzing around with it a bit, I found that something like this should work (or at least give you a starting point).
这是一篇有关使用 LINQ to XML 查询 Open XML WordprocessingML 文档的博客文章。使用该代码,您可以编写如下查询:
博客文章:开放 XML SDK 和 LINQ to XML
-Eric
Here is a blog post on querying Open XML WordprocessingML documents using LINQ to XML. Using that code, you can write a query as follows:
Blog post: Open XML SDK and LINQ to XML
-Eric
请参阅XML 文档和数据作为起点。特别是,您需要使用 LINQ to XML。
通常,您不想在 .NET 应用程序中使用 COM。
See XML Documents and Data as a starting point. In particular, you'll want to use LINQ to XML.
In general, you do not want to use COM in a .NET application.