如何在 ASP.NET 应用程序中查询 Word docx?

发布于 2024-08-02 04:31:43 字数 166 浏览 11 评论 0原文

我想将 Word 2007 或更高版本的 docx 文件上传到我的 Web 服务器,并将目录转换为简单的 xml 结构。使用传统 VBA 在桌面上执行此操作似乎很容易。查看用于创建 docx 文件的 WordprocessingML XML 数据会令人困惑。有没有一种方法(无需 COM)以更多面向对象的方式导航文档?

I would like to upload a Word 2007 or greater docx file to my web server and convert the table of contents to a simple xml structure. Doing this on the desktop with traditional VBA seems like it would have been easy. Looking at the WordprocessingML XML data used to create the docx file is confusing. Is there a way (without COM) to navigate the document in more of an object-oriented fashion?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

断爱 2024-08-09 04:31:43

我强烈建议您查看 打开 XML SDK 2.0。它是一个 CTP,但我发现它在操作 xmlx 文件方面非常有用,而无需处理 COM。该文档有点粗略,但要查找的关键是 DocumentFormat.OpenXml.Packaging.WordprocessingDocument 类。如果将扩展名重命名为 .zip 并深入研究其中的 XML 文件,则可以拆开 .docx 文档。通过这样做,看起来目录包含在“结构化文档”标签中,并且标题之类的内容位于那里的超链接中。经过一番研究,我发现类似的东西应该有效(或者至少给你一个起点)。

WordprocessingDocument wordDoc = WordprocessingDocument.Open(Filename, false);
SdtBlock contents = wordDoc.MainDocumentPart.Document.Descendants<SdtBlock>().First();
List<string> contentList = new List<string>();
foreach (Hyperlink section in contents.Descendants<Hyperlink>())
{
    contentList.Add(section.Descendants<Text>().First().Text);
}

I highly recommend looking into the Open XML SDK 2.0. It's a CTP, but I've found it extremely useful in manipulating xmlx files without having to deal with COM at all. The documentation is a bit sketchy, but the key thing to look for is the DocumentFormat.OpenXml.Packaging.WordprocessingDocument class. You can pick apart the .docx document if you rename the extension to .zip and dig into the XML files there. From doing that, it looks like a Table of Contents is contained in a "Structured Document" tag and that things like the headings are in a hyperlink from there. Putzing around with it a bit, I found that something like this should work (or at least give you a starting point).

WordprocessingDocument wordDoc = WordprocessingDocument.Open(Filename, false);
SdtBlock contents = wordDoc.MainDocumentPart.Document.Descendants<SdtBlock>().First();
List<string> contentList = new List<string>();
foreach (Hyperlink section in contents.Descendants<Hyperlink>())
{
    contentList.Add(section.Descendants<Text>().First().Text);
}
记忆之渊 2024-08-09 04:31:43

这是一篇有关使用 LINQ to XML 查询 Open XML WordprocessingML 文档的博客文章。使用该代码,您可以编写如下查询:

using (WordprocessingDocument doc =
    WordprocessingDocument.Open(filename, false))
{
    foreach (var p in doc.MainDocumentPart.Paragraphs())
    {
        Console.WriteLine("Style: {0}   Text: >{1}<",
            p.StyleName.PadRight(16), p.Text);
        foreach (var c in p.Comments())
            Console.WriteLine(
              "  Comment Author:{0}  Text:>{1}<",
              c.Author, c.Text);
    }
}

博客文章:开放 XML SDK 和 LINQ to XML

-Eric

Here is a blog post on querying Open XML WordprocessingML documents using LINQ to XML. Using that code, you can write a query as follows:

using (WordprocessingDocument doc =
    WordprocessingDocument.Open(filename, false))
{
    foreach (var p in doc.MainDocumentPart.Paragraphs())
    {
        Console.WriteLine("Style: {0}   Text: >{1}<",
            p.StyleName.PadRight(16), p.Text);
        foreach (var c in p.Comments())
            Console.WriteLine(
              "  Comment Author:{0}  Text:>{1}<",
              c.Author, c.Text);
    }
}

Blog post: Open XML SDK and LINQ to XML

-Eric

不必在意 2024-08-09 04:31:43

请参阅XML 文档和数据作为起点。特别是,您需要使用 LINQ to XML。

通常,您不想在 .NET 应用程序中使用 COM。

See XML Documents and Data as a starting point. In particular, you'll want to use LINQ to XML.

In general, you do not want to use COM in a .NET application.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文