使用 OpenXml 和 C# 复制 Word 文档

发布于 2024-07-27 04:59:09 字数 417 浏览 3 评论 0原文

我正在使用 Word 和 OpenXml 在 C# ASP.NET Web 应用程序中提供邮件合并功能:

1) 上传一个文档,其中包含许多预定义的字符串以供替换。

2) 使用 OpenXML SDK 2.0 打开 Word 文档,以字符串形式获取 mainDocumentPart,并使用 Regex 执行替换。

3) 然后,我使用 OpenXML 创建一个新文档,添加一个新的 mainDocumentPart 并将替换产生的字符串插入到该 mainDocumentPart 中。

但是,所有格式/样式等都会在新文档中丢失。

我猜我可以单独复制并添加样式、定义、注释部分等来模仿原始文档。

但是,是否有一种使用 Open XML 复制文档的方法,允许我在新副本上执行替换?

谢谢。

I am using Word and OpenXml to provide mail merge functionality in a C# ASP.NET web application:

1) A document is uploaded with a number of pre-defined strings for substitution.

2) Using the OpenXML SDK 2.0 I open the Word document, get the mainDocumentPart as a string and perform the substitution using Regex.

3) I then create a new document using OpenXML, add a new mainDocumentPart and insert the string resulting from the substitution into this mainDocumentPart.

However, all formatting/styles etc. are lost in the new document.

I'm guessing I can copy and add the Style, Definitions, Comment parts etc.. individually to mimic the orginal document.

However is there a method using Open XML to duplicate a document allowing me to perform the substitutions on the new copy?

Thanks.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

池木 2024-08-03 04:59:10

这段代码应该将现有文档的所有部分复制到新文档中。

using (var mainDoc = WordprocessingDocument.Open(@"c:\sourcedoc.docx", false))
using (var resultDoc = WordprocessingDocument.Create(@"c:\newdoc.docx",
  WordprocessingDocumentType.Document))
{
  // copy parts from source document to new document
  foreach (var part in mainDoc.Parts)
    resultDoc.AddPart(part.OpenXmlPart, part.RelationshipId);
  // perform replacements in resultDoc.MainDocumentPart
  // ...
}

This piece of code should copy all parts from an existing document to a new one.

using (var mainDoc = WordprocessingDocument.Open(@"c:\sourcedoc.docx", false))
using (var resultDoc = WordprocessingDocument.Create(@"c:\newdoc.docx",
  WordprocessingDocumentType.Document))
{
  // copy parts from source document to new document
  foreach (var part in mainDoc.Parts)
    resultDoc.AddPart(part.OpenXmlPart, part.RelationshipId);
  // perform replacements in resultDoc.MainDocumentPart
  // ...
}
¢蛋碎的人ぎ生 2024-08-03 04:59:10

我赞成使用内容控制建议。 使用它们来标记文档中要执行替换的区域是迄今为止最简单的方法。

至于复制文档(并保留整个文档内容、样式等)相对容易:

string documentURL = "full URL to your document";
byte[] docAsArray = File.ReadAllBytes(documentURL);

using (MemoryStream stream = new MemoryStream)
{
    stream.Write(docAsArray, 0, docAsArray.Length);    // THIS performs doc copy
    using (WordprocessingDocument doc = WordprocessingDocument.Open(stream, true))
    {
        // perform content control substitution here, making sure to call .Save()
        // on any documents Part's changed.
    }
    File.WriteAllBytes("full URL of your new doc to save, including .docx", stream.ToArray());
}

实际上使用 LINQ 查找内容控件是小菜一碟。 以下示例查找所有简单文本内容控件(类型为 SdtRun):

using (WordprocessingDocument doc = WordprocessingDocument.Open(stream, true))
{                    
    var mainDocument = doc.MainDocumentPart.Document;
    var contentControls = from sdt in mainDocument.Descendants<SdtRun>() select sdt;

    foreach (var cc in contentControls)
    {
        // drill down through the containment hierarchy to get to 
        // the contained <Text> object
        cc.SdtContentRun.GetFirstChild<Run>().GetFirstChild<Text>().Text = "my replacement string";
    }
}

元素可能尚不存在,但正在创建它们很简单:

cc.SdtContentRun.Append(new Run(new Text("my replacement string")));

希望对某人有帮助。 :D

I second the use of Content Controls recommendation. Using them to mark up the areas of your document where you want to perform substitution is by far the easiest way to do it.

As for duplicating the document (and retaining the entire document contents, styles and all) it's relatively easy:

string documentURL = "full URL to your document";
byte[] docAsArray = File.ReadAllBytes(documentURL);

using (MemoryStream stream = new MemoryStream)
{
    stream.Write(docAsArray, 0, docAsArray.Length);    // THIS performs doc copy
    using (WordprocessingDocument doc = WordprocessingDocument.Open(stream, true))
    {
        // perform content control substitution here, making sure to call .Save()
        // on any documents Part's changed.
    }
    File.WriteAllBytes("full URL of your new doc to save, including .docx", stream.ToArray());
}

Actually finding the content controls is a piece of cake using LINQ. The following example finds all the Simple Text content controls (which are typed as SdtRun):

using (WordprocessingDocument doc = WordprocessingDocument.Open(stream, true))
{                    
    var mainDocument = doc.MainDocumentPart.Document;
    var contentControls = from sdt in mainDocument.Descendants<SdtRun>() select sdt;

    foreach (var cc in contentControls)
    {
        // drill down through the containment hierarchy to get to 
        // the contained <Text> object
        cc.SdtContentRun.GetFirstChild<Run>().GetFirstChild<Text>().Text = "my replacement string";
    }
}

The <Run> and <Text> elements may not already exist but creating them is a simple as:

cc.SdtContentRun.Append(new Run(new Text("my replacement string")));

Hope that helps someone. :D

薔薇婲 2024-08-03 04:59:10

在将许多有用的功能添加到 Open XML SDK 之前就提出了最初的问题。 如今,如果您已经打开了 WordprocessingDocument,您只需克隆原始文档并对该克隆执行任何转换即可。

// Say you have done this somewhere before you want to duplicate your document.
using WordprocessingDocument originalDoc = WordprocessingDocument.Open("original.docx", false);

// Then this is how you can clone the opened WordprocessingDocument.
using var newDoc = (WordprocessingDocument) originalDoc.Clone("copy.docx", true);

// Perform whatever transformation you want to do.
PerformTransformation(newDoc);

您还可以克隆StreamPackage。 总的来说,您有以下选择:

OpenXmlPackage Clone()

OpenXmlPackage Clone(Stream stream)
OpenXmlPackage Clone(Stream stream, bool isEditable)
OpenXmlPackage Clone(Stream stream, bool isEditable, OpenSettings openSettings)

OpenXmlPackage Clone(string path)
OpenXmlPackage Clone(string path, bool isEditable)
OpenXmlPackage Clone(string path, bool isEditable, OpenSettings openSettings)

OpenXmlPackage Clone(Package package)
OpenXmlPackage Clone(Package package, OpenSettings openSettings)

查看 Open XML SDK 文档以获取有关这些方法的详细信息。

尽管如此,如果您尚未打开 WordprocessingDocument,至少还有更快的方法来复制或克隆文档。 我在 克隆 Office Open XML 文档的最有效方法

The original question was asked before a number of helpful features were added to the Open XML SDK. Nowadays, if you already have an opened WordprocessingDocument, you would simply clone the original document and perform whatever transformation on that clone.

// Say you have done this somewhere before you want to duplicate your document.
using WordprocessingDocument originalDoc = WordprocessingDocument.Open("original.docx", false);

// Then this is how you can clone the opened WordprocessingDocument.
using var newDoc = (WordprocessingDocument) originalDoc.Clone("copy.docx", true);

// Perform whatever transformation you want to do.
PerformTransformation(newDoc);

You can also clone on a Stream or Package. Overall, you have the following options:

OpenXmlPackage Clone()

OpenXmlPackage Clone(Stream stream)
OpenXmlPackage Clone(Stream stream, bool isEditable)
OpenXmlPackage Clone(Stream stream, bool isEditable, OpenSettings openSettings)

OpenXmlPackage Clone(string path)
OpenXmlPackage Clone(string path, bool isEditable)
OpenXmlPackage Clone(string path, bool isEditable, OpenSettings openSettings)

OpenXmlPackage Clone(Package package)
OpenXmlPackage Clone(Package package, OpenSettings openSettings)

Have a look at the Open XML SDK documentation for details on those methods.

Having said that, if you have not yet opened the WordprocessingDocument, there are at least faster ways to duplicate, or clone, the document. I've demonstrated this in my answer on the most efficient way to clone Office Open XML documents.

层林尽染 2024-08-03 04:59:10

我做了一些非常类似的事情,但我不使用文本替换字符串,而是使用文字内容控件。 我已在以下博客文章 SharePoint 和 Open 中记录了一些详细信息xml。 该技术并非特定于 SharePoint。 您可以在纯 ASP.NET 或其他应用程序中重用该模式。

另外,我强烈建议您查看 Eric White 的博客,了解有关 Open Xml 的提示、技巧和技术。 具体来说,请查看 Open Xml 帖子的内存操作,以及 Word 内容控件 帖子。 我想从长远来看你会发现这些更有帮助。

希望这可以帮助。

I have done some very similar things, but instead of using text substitution strings, I use Word Content Controls. I have documented some of the details in the following blog post, SharePoint and Open Xml. The technique is not specific to SharePoint. You could reuse the pattern in pure ASP.NET or other applications.

Also, I would STRONGLY encourage you to review Eric White's Blog for tips, tricks and techniques regarding Open Xml. Specifically, check out the in-memory manipulation of Open Xml post, and the Word content controls posts. I think you'll find these much more helpful in the long run.

Hope this helps.

陪你到最终 2024-08-03 04:59:10

作为上述内容的补充; 也许更有用的是查找已标记的内容控件(使用单词 GUI)。 我最近编写了一些软件,用于填充文档模板,其中包含带有附加标签的内容控件。 找到它们只是上述 LINQ 查询的扩展:

var mainDocument = doc.MainDocumentPart.Document;
var taggedContentControls = from sdt in mainDocument.Descendants<SdtElement>()
                            let sdtPr = sdt.GetFirstChild<SdtProperties>()
                            let tag = (sdtPr == null ? null : sdtPr.GetFirstChild<Tag>())
                            where (tag != null)
                            select new
                            {
                                SdtElem = sdt,
                                TagName = tag.GetAttribute("val", W).Value
                            };   

我从其他地方获得了这段代码,但目前不记得在哪里; 完全归功于他们。

该查询只是创建一个匿名类型的 IEnumerable,其中包含内容控件及其关联的标记作为属性。 便利!

As an addenda to the above; what's perhaps more useful is finding content controls that have been tagged (using the word GUI). I recently wrote some software that populated document templates that contained content controls with tags attached. To find them is just an extension of the above LINQ query:

var mainDocument = doc.MainDocumentPart.Document;
var taggedContentControls = from sdt in mainDocument.Descendants<SdtElement>()
                            let sdtPr = sdt.GetFirstChild<SdtProperties>()
                            let tag = (sdtPr == null ? null : sdtPr.GetFirstChild<Tag>())
                            where (tag != null)
                            select new
                            {
                                SdtElem = sdt,
                                TagName = tag.GetAttribute("val", W).Value
                            };   

I got this code from elsewhere but cannot remember where at the moment; full credit goes to them.

The query just creates an IEnumerable of an anonymous type that contains the content control and its associated tag as properties. Handy!

空城缀染半城烟沙 2024-08-03 04:59:10

当您通过将扩展名更改为 zip 并打开它来查看 openxml 文档时,您会看到该 word 子文件夹包含一个 _rels 文件夹,其中列出了所有关系。 这些关系指向你提到的部分(风格......)。 实际上您需要这些部分,因为它们包含格式的定义。 因此,不复制它们将导致新文档使用 normal.dot 文件中定义的格式,而不是原始文档中定义的格式。 所以我认为你必须复制它们。

When you look at an openxml document by changing the extension to zip and opening it you see that that word subfolder contains a _rels folder where all the relations are listed. These relations point to the parts you mentioned (style ...). Actually you need these parts because they contain the definition of the formatting. So not copying them will cause the new document to use the formatting defined in the normal.dot file and not the one defined in the original document. So I think you have to copy them.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文