将多个 DOCX 文件附加在一起

发布于 2024-07-08 06:26:59 字数 341 浏览 9 评论 0原文

我需要以编程方式使用 C# 将多个预先存在的 docx 文件附加到单个长 docx 文件中 - 包括特殊标记,如项目符号和图像。 页眉和页脚信息将被删除,因此这些信息不会造成任何问题。

我可以找到大量有关使用 .NET Framework 3 操作单个 docx 文件的信息,但没有关于如何合并文件的简单或明显的信息。 还有一个第三方程序(Acronis.Words)可以做到这一点,但价格昂贵得令人望而却步。


有人建议通过 Word 进行自动化,但我的代码将在 IIS Web 服务器上的 ASP.NET 上运行,因此使用 Word 对我来说不是一个选择。 很抱歉一开始没有提到这一点。

I need to use C# programatically to append several preexisting docx files into a single, long docx file - including special markups like bullets and images. Header and footer information will be stripped out, so those won't be around to cause any problems.

I can find plenty of information about manipulating an individual docx file with .NET Framework 3, but nothing easy or obvious about how you would merge files. There is also a third-party program (Acronis.Words) that will do it, but it is prohibitively expensive.


Automating through Word has been suggested, but my code is going to be running on ASP.NET on an IIS web server, so going out to Word is not an option for me. Sorry for not mentioning that in the first place.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。



需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。


雄赳赳气昂昂 2024-07-15 06:26:59

尽管提交了所有好的建议和解决方案,我还是开发了一个替代方案。 我认为您应该完全避免在服务器应用程序中使用 Word。 所以我使用了 OpenXML,但它不适用于 AltChunk。 我将文本添加到原始正文中,我收到一个字节 [] 列表而不是文件名列表,但您可以根据需要轻松更改代码。

using System;
using System.Collections.Generic;
using System.Globalization;
using System.IO;
using System.Xml.Linq;
using DocumentFormat.OpenXml.Packaging;
using DocumentFormat.OpenXml.Wordprocessing;

namespace OfficeMergeControl
    public class CombineDocs
        public byte[] OpenAndCombine( IList<byte[]> documents )
            MemoryStream mainStream = new MemoryStream();

            mainStream.Write(documents[0], 0, documents[0].Length);
            mainStream.Position = 0;

            int pointer = 1;
            byte[] ret;
                using (WordprocessingDocument mainDocument = WordprocessingDocument.Open(mainStream, true))

                    XElement newBody = XElement.Parse(mainDocument.MainDocumentPart.Document.Body.OuterXml);

                    for (pointer = 1; pointer < documents.Count; pointer++)
                        WordprocessingDocument tempDocument = WordprocessingDocument.Open(new MemoryStream(documents[pointer]), true);
                        XElement tempBody = XElement.Parse(tempDocument.MainDocumentPart.Document.Body.OuterXml);

                        mainDocument.MainDocumentPart.Document.Body = new Body(newBody.ToString());
            catch (OpenXmlPackageException oxmle)
                throw new OfficeMergeControlException(string.Format(CultureInfo.CurrentCulture, "Error while merging files. Document index {0}", pointer), oxmle);
            catch (Exception e)
                throw new OfficeMergeControlException(string.Format(CultureInfo.CurrentCulture, "Error while merging files. Document index {0}", pointer), e);
                ret = mainStream.ToArray();
            return (ret);


In spite of all good suggestions and solutions submitted, I developed an alternative. In my opinion you should avoid using Word in server applications entirely. So I worked with OpenXML, but it did not work with AltChunk. I added text to original body, I receive a List of byte[] instead a List of file names but you can easily change the code to your needs.

using System;
using System.Collections.Generic;
using System.Globalization;
using System.IO;
using System.Xml.Linq;
using DocumentFormat.OpenXml.Packaging;
using DocumentFormat.OpenXml.Wordprocessing;

namespace OfficeMergeControl
    public class CombineDocs
        public byte[] OpenAndCombine( IList<byte[]> documents )
            MemoryStream mainStream = new MemoryStream();

            mainStream.Write(documents[0], 0, documents[0].Length);
            mainStream.Position = 0;

            int pointer = 1;
            byte[] ret;
                using (WordprocessingDocument mainDocument = WordprocessingDocument.Open(mainStream, true))

                    XElement newBody = XElement.Parse(mainDocument.MainDocumentPart.Document.Body.OuterXml);

                    for (pointer = 1; pointer < documents.Count; pointer++)
                        WordprocessingDocument tempDocument = WordprocessingDocument.Open(new MemoryStream(documents[pointer]), true);
                        XElement tempBody = XElement.Parse(tempDocument.MainDocumentPart.Document.Body.OuterXml);

                        mainDocument.MainDocumentPart.Document.Body = new Body(newBody.ToString());
            catch (OpenXmlPackageException oxmle)
                throw new OfficeMergeControlException(string.Format(CultureInfo.CurrentCulture, "Error while merging files. Document index {0}", pointer), oxmle);
            catch (Exception e)
                throw new OfficeMergeControlException(string.Format(CultureInfo.CurrentCulture, "Error while merging files. Document index {0}", pointer), e);
                ret = mainStream.ToArray();
            return (ret);

I hope this helps you.

安静被遗忘 2024-07-15 06:26:59

您不需要使用自动化。 DOCX 文件基于 OpenXML 格式。 它们只是 zip 文件,里面有一堆 XML 和二进制部分(想想文件)。 您可以使用打包 API(WindowsBase.dll 中的 System.IO.Packaging)打开它们,并使用框架中的任何 XML 类操作它们。

查看 OpenXMLDeveloper.org 了解详细信息。

You don't need to use automation. DOCX files are based on the OpenXML Formats. They are just zip files with a bunch of XML and binary parts (think files) inside. You can open them with the Packaging API (System.IO.Packaging in WindowsBase.dll) and manipulate them with any of the XML classes in the Framework.

Check out OpenXMLDeveloper.org for details.

太阳公公是暖光 2024-07-15 06:26:59

这对于最初的问题来说已经很晚了,并且发生了很大的变化,但我想我会分享我编写合并逻辑的方式。 这利用了 Open XML Power Tools

public byte[] CreateDocument(IList<byte[]> documentsToMerge)
    List<Source> documentBuilderSources = new List<Source>();
    foreach (byte[] documentByteArray in documentsToMerge)
        documentBuilderSources.Add(new Source(new WmlDocument(string.Empty, documentByteArray), false));

    WmlDocument mergedDocument = DocumentBuilder.BuildDocument(documentBuilderSources);
    return mergedDocument.DocumentByteArray;

目前,这在我们的应用程序中运行得很好。 我对代码做了一些更改,因为我的要求是每个文档都需要先处理。 因此传入的是一个 DTO 对象,其中包含模板字节数组和需要替换的各种值。 这是我的代码当前的样子。 这使得代码更进一步。

public byte[] CreateDocument(IList<DocumentSection> documentTemplates)
    List<Source> documentBuilderSources = new List<Source>();
    foreach (DocumentSection documentTemplate in documentTemplates.OrderBy(dt => dt.Rank))
        // Take the template replace the items and then push it into the chunk
        using (MemoryStream templateStream = new MemoryStream())
            templateStream.Write(documentTemplate.Template, 0, documentTemplate.Template.Length);

            this.ProcessOpenXMLDocument(templateStream, documentTemplate.Fields);

            documentBuilderSources.Add(new Source(new WmlDocument(string.Empty, templateStream.ToArray()), false));

    WmlDocument mergedDocument = DocumentBuilder.BuildDocument(documentBuilderSources);
    return mergedDocument.DocumentByteArray;

This is a very late to the original question and quite a bit has change but thought I would share the way I have written my merge logic. This makes use of the Open XML Power Tools

public byte[] CreateDocument(IList<byte[]> documentsToMerge)
    List<Source> documentBuilderSources = new List<Source>();
    foreach (byte[] documentByteArray in documentsToMerge)
        documentBuilderSources.Add(new Source(new WmlDocument(string.Empty, documentByteArray), false));

    WmlDocument mergedDocument = DocumentBuilder.BuildDocument(documentBuilderSources);
    return mergedDocument.DocumentByteArray;

Currently this is working very well in our application. I have changed the code a little because my requirements is that each document that needs to be processed first. So what gets passed in is a DTO object with the template byte array and the various values that need to be replaced. Here is how my code currently looks. Which takes the code a little bit further.

public byte[] CreateDocument(IList<DocumentSection> documentTemplates)
    List<Source> documentBuilderSources = new List<Source>();
    foreach (DocumentSection documentTemplate in documentTemplates.OrderBy(dt => dt.Rank))
        // Take the template replace the items and then push it into the chunk
        using (MemoryStream templateStream = new MemoryStream())
            templateStream.Write(documentTemplate.Template, 0, documentTemplate.Template.Length);

            this.ProcessOpenXMLDocument(templateStream, documentTemplate.Fields);

            documentBuilderSources.Add(new Source(new WmlDocument(string.Empty, templateStream.ToArray()), false));

    WmlDocument mergedDocument = DocumentBuilder.BuildDocument(documentBuilderSources);
    return mergedDocument.DocumentByteArray;
戏蝶舞 2024-07-15 06:26:59

我不久前编写了一个小测试应用程序来执行此操作。 我的测试应用程序使用的是 Word 2003 文档 (.doc),而不是 .docx,但我想过程是相同的 - 我认为您需要更改的就是使用主互操作程序集的较新版本。 使用新的 C# 4.0 功能,这段代码看起来会整洁很多...

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;

using Microsoft.Office.Interop.Word;
using Microsoft.Office.Core;
using System.Runtime.InteropServices;
using System.IO;

namespace ConsoleApplication1
    class Program
        static void Main(string[] args)
            new Program().Start();

        private void Start()
            object fileName = Path.Combine(Environment.CurrentDirectory, @"NewDocument.doc");

                WordApplication = new ApplicationClass();
                var doc = WordApplication.Documents.Add(ref missing, ref missing, ref missing, ref missing);

                    AddDocument(@"D:\Projects\WordTests\ConsoleApplication1\Documents\Doc1.doc", doc, false);
                    AddDocument(@"D:\Projects\WordTests\ConsoleApplication1\Documents\Doc2.doc", doc, true);

                    doc.SaveAs(ref fileName,
                        ref missing, ref missing, ref missing, ref missing,     ref missing,
                        ref missing, ref missing, ref missing, ref missing, ref missing,
                        ref missing, ref missing, ref missing, ref missing, ref missing);
                    doc.Close(ref missing, ref missing, ref missing);
                WordApplication.Quit(ref missing, ref missing, ref missing);

        private void AddDocument(string path, Document doc, bool lastDocument)
            object subDocPath = path;
            var subDoc = WordApplication.Documents.Open(ref subDocPath, ref missing, ref missing, ref missing,
                ref missing, ref missing, ref missing, ref missing, ref missing,
                ref missing, ref missing, ref missing, ref missing, ref missing,
                ref missing, ref missing);

                object docStart = doc.Content.End - 1;
                object docEnd = doc.Content.End;

                object start = subDoc.Content.Start;
                object end = subDoc.Content.End;

                Range rng = doc.Range(ref docStart, ref docEnd);
                rng.FormattedText = subDoc.Range(ref start, ref end);

                if (!lastDocument)
                subDoc.Close(ref missing, ref missing, ref missing);

        private static void InsertPageBreak(Document doc)
            object docStart = doc.Content.End - 1;
            object docEnd = doc.Content.End;
            Range rng = doc.Range(ref docStart, ref docEnd);

            object pageBreak = WdBreakType.wdPageBreak;
            rng.InsertBreak(ref pageBreak);

        private ApplicationClass WordApplication { get; set; }

        private object missing = Type.Missing;

I wrote a little test app a while ago to do this. My test app worked with Word 2003 documents (.doc) not .docx, but I imagine the process is the same - I should think all you'd have to change is to use a newer version of the Primary Interop Assembly. This code would look a lot neater with the new C# 4.0 features...

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;

using Microsoft.Office.Interop.Word;
using Microsoft.Office.Core;
using System.Runtime.InteropServices;
using System.IO;

namespace ConsoleApplication1
    class Program
        static void Main(string[] args)
            new Program().Start();

        private void Start()
            object fileName = Path.Combine(Environment.CurrentDirectory, @"NewDocument.doc");

                WordApplication = new ApplicationClass();
                var doc = WordApplication.Documents.Add(ref missing, ref missing, ref missing, ref missing);

                    AddDocument(@"D:\Projects\WordTests\ConsoleApplication1\Documents\Doc1.doc", doc, false);
                    AddDocument(@"D:\Projects\WordTests\ConsoleApplication1\Documents\Doc2.doc", doc, true);

                    doc.SaveAs(ref fileName,
                        ref missing, ref missing, ref missing, ref missing,     ref missing,
                        ref missing, ref missing, ref missing, ref missing, ref missing,
                        ref missing, ref missing, ref missing, ref missing, ref missing);
                    doc.Close(ref missing, ref missing, ref missing);
                WordApplication.Quit(ref missing, ref missing, ref missing);

        private void AddDocument(string path, Document doc, bool lastDocument)
            object subDocPath = path;
            var subDoc = WordApplication.Documents.Open(ref subDocPath, ref missing, ref missing, ref missing,
                ref missing, ref missing, ref missing, ref missing, ref missing,
                ref missing, ref missing, ref missing, ref missing, ref missing,
                ref missing, ref missing);

                object docStart = doc.Content.End - 1;
                object docEnd = doc.Content.End;

                object start = subDoc.Content.Start;
                object end = subDoc.Content.End;

                Range rng = doc.Range(ref docStart, ref docEnd);
                rng.FormattedText = subDoc.Range(ref start, ref end);

                if (!lastDocument)
                subDoc.Close(ref missing, ref missing, ref missing);

        private static void InsertPageBreak(Document doc)
            object docStart = doc.Content.End - 1;
            object docEnd = doc.Content.End;
            Range rng = doc.Range(ref docStart, ref docEnd);

            object pageBreak = WdBreakType.wdPageBreak;
            rng.InsertBreak(ref pageBreak);

        private ApplicationClass WordApplication { get; set; }

        private object missing = Type.Missing;
风铃鹿 2024-07-15 06:26:59

您想要使用 AltChunks 和 OpenXml SDK 1.0(如果可以的话,至少使用 2.0)。 查看 Eric White 的博客了解更多详细信息,这也是一个很棒的资源! 这是一个代码示例,即使不能立即工作,也可以帮助您入门。

public void AddAltChunkPart(Stream parentStream, Stream altStream, string altChunkId)
    //make sure we are at the start of the stream    
    parentStream.Position = 0;
    altStream.Position = 0;
    //push the parentStream into a WordProcessing Document
    using (WordprocessingDocument wordDoc = WordprocessingDocument.Open(parentStream, true))
        //get the main document part
        MainDocumentPart mainPart = wordDoc.MainDocumentPart;
        //create an altChunk part by adding a part to the main document part
        AlternativeFormatImportPart chunk = mainPart.AddAlternativeFormatImportPart(altChunkPartType, altChunkId);
        //feed the altChunk stream into the chunk part
        //create and XElement to represent the new chunk in the document
        XElement newChunk = new XElement(altChunk, new XAttribute(relId, altChunkId));
        //Add the chunk to the end of the document (search to last paragraph in body and add at the end)
        //Finally, save the document
    //reset position of parent stream
    parentStream.Position = 0;

You want to use AltChunks and the OpenXml SDK 1.0 (at a minimum, 2.0 if you can). Check out Eric White's blog for more details and just as a great resource!. Here is a code sample that should get you started, if not work immediately.

public void AddAltChunkPart(Stream parentStream, Stream altStream, string altChunkId)
    //make sure we are at the start of the stream    
    parentStream.Position = 0;
    altStream.Position = 0;
    //push the parentStream into a WordProcessing Document
    using (WordprocessingDocument wordDoc = WordprocessingDocument.Open(parentStream, true))
        //get the main document part
        MainDocumentPart mainPart = wordDoc.MainDocumentPart;
        //create an altChunk part by adding a part to the main document part
        AlternativeFormatImportPart chunk = mainPart.AddAlternativeFormatImportPart(altChunkPartType, altChunkId);
        //feed the altChunk stream into the chunk part
        //create and XElement to represent the new chunk in the document
        XElement newChunk = new XElement(altChunk, new XAttribute(relId, altChunkId));
        //Add the chunk to the end of the document (search to last paragraph in body and add at the end)
        //Finally, save the document
    //reset position of parent stream
    parentStream.Position = 0;
做个少女永远怀春 2024-07-15 06:26:59


  • 将两个文档作为包
  • 打开 循环遍历第二个文档的部分,查找图像和嵌入内容
  • 将这些部分添加到第一个包中,记住新的关系 ID(这涉及大量流工作)
  • 打开第二个文档中的 document.xml 部分并替换所有内容旧的关系 ID 与新的关系 ID - 将第二个 document.xml 的所有子节点(但不是根节点)附加到第一个 document.xml
  • 保存所有 XmlDocuments 并刷新包

Its quit complex so the code is outside the scope of a forum post, I'd be writing your App for you, but to sum up.

  • Open both documents as Packages
  • Loop through the second docuemnt's parts looking for images and embbed stuff
  • Add these parts to the first package remembering the new relationship IDs(this involves alot of stream work)
  • open the document.xml part in the second document and replace all the old relationship IDs with the new ones- Append all the child nodes, but not the root node, of the second document.xml to the first document.xml
  • save all the XmlDocuments and Flush the Package
小兔几 2024-07-15 06:26:59

我用 C# 创建了一个应用程序,将 RTF 文件合并到一个文档中,我希望它也适用于 DOC 和 DOCX 文件。

    Word._Application wordApp;
    Word._Document wordDoc;
    object outputFile = outputFileName;
    object missing = System.Type.Missing;
    object vk_false = false;
    object defaultTemplate = defaultWordDocumentTemplate;
    object pageBreak = Word.WdBreakType.wdPageBreak;
    string[] filesToMerge = new string[pageCounter];
    filestoDelete = new string[pageCounter];

    for (int i = 0; i < pageCounter; i++)
        filesToMerge[i] = @"C:\temp\temp" + i.ToString() + ".rtf";
        filestoDelete[i] = @"C:\temp\temp" + i.ToString() + ".rtf";                
        wordDoc = wordApp.Documents.Add(ref missing, ref missing, ref missing, ref missing);
    catch(Exception ex)
    Word.Selection selection= wordApp.Selection;

    foreach (string file in filesToMerge)
            ref missing,
            ref missing,
            ref missing,
            ref missing);

        selection.InsertBreak(ref pageBreak);                                     
    wordDoc.SaveAs(ref outputFile, ref missing, ref missing, ref missing, ref missing, ref missing,
           ref missing, ref missing, ref missing, ref missing, ref missing, ref missing, ref missing, ref missing,
           ref missing, ref missing);


I had made an application in C# to merge RTF files into one doc,Iam hopeful it should work for DOC and DOCX files as well.

    Word._Application wordApp;
    Word._Document wordDoc;
    object outputFile = outputFileName;
    object missing = System.Type.Missing;
    object vk_false = false;
    object defaultTemplate = defaultWordDocumentTemplate;
    object pageBreak = Word.WdBreakType.wdPageBreak;
    string[] filesToMerge = new string[pageCounter];
    filestoDelete = new string[pageCounter];

    for (int i = 0; i < pageCounter; i++)
        filesToMerge[i] = @"C:\temp\temp" + i.ToString() + ".rtf";
        filestoDelete[i] = @"C:\temp\temp" + i.ToString() + ".rtf";                
        wordDoc = wordApp.Documents.Add(ref missing, ref missing, ref missing, ref missing);
    catch(Exception ex)
    Word.Selection selection= wordApp.Selection;

    foreach (string file in filesToMerge)
            ref missing,
            ref missing,
            ref missing,
            ref missing);

        selection.InsertBreak(ref pageBreak);                                     
    wordDoc.SaveAs(ref outputFile, ref missing, ref missing, ref missing, ref missing, ref missing,
           ref missing, ref missing, ref missing, ref missing, ref missing, ref missing, ref missing, ref missing,
           ref missing, ref missing);

Hope this helps!

甜中书 2024-07-15 06:26:59


void AppendToExistingFile(string existingFile, IList<string> filenames)
    using (WordprocessingDocument document = WordprocessingDocument.Open(existingFile, true))
        MainDocumentPart mainPart = document.MainDocumentPart;

        for (int i = filenames.Count - 1; i >= 0; --i)
            string altChunkId = "AltChunkId" + i;
            AlternativeFormatImportPart chunk = mainPart.AddAlternativeFormatImportPart(AlternativeFormatImportPartType.WordprocessingML, altChunkId);

            using (FileStream fileStream = File.Open(filenames[i], FileMode.Open))

            AltChunk altChunk = new AltChunk { Id = altChunkId };
            mainPart.Document.Body.InsertAfter(altChunk, mainPart.Document.Body.Elements<Paragraph>().Last());


For anyone who wants to work with a list of file names:

void AppendToExistingFile(string existingFile, IList<string> filenames)
    using (WordprocessingDocument document = WordprocessingDocument.Open(existingFile, true))
        MainDocumentPart mainPart = document.MainDocumentPart;

        for (int i = filenames.Count - 1; i >= 0; --i)
            string altChunkId = "AltChunkId" + i;
            AlternativeFormatImportPart chunk = mainPart.AddAlternativeFormatImportPart(AlternativeFormatImportPartType.WordprocessingML, altChunkId);

            using (FileStream fileStream = File.Open(filenames[i], FileMode.Open))

            AltChunk altChunk = new AltChunk { Id = altChunkId };
            mainPart.Document.Body.InsertAfter(altChunk, mainPart.Document.Body.Elements<Paragraph>().Last());

我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。