如何使用 OpenXML Sdk 替换段落文本

发布于 2024-10-04 17:55:31 字数 1460 浏览 7 评论 0原文

我正在使用 .Net OpenXml SDK 2.0 解析一些 Openxml word 文档。作为处理的一部分,我需要用其他句子替换某些句子。在迭代这些段落时,我知道何时找到需要替换的内容,但我对如何替换它感到困惑。

例如,假设我需要将句子 “专门用于不是建筑工作的建筑工作的合同。” 替换为下面的 Sharepoint 可重用内容的 html 片段。

专门用于建筑工程的合同这不是构建工作。

PS:我使用 xslt 完成了 docx 到 Html 的转换,因此在现阶段这不是问题

Paragraph 节点的 InnerText 属性为我提供了正确的文本,但内部文本属性本身不可设置。所以 Regex.Match(currentParagraph.InnerText, currentString).Success 返回 true 并告诉我当前段落包含我想要的文本。

正如我所说,InnerText 本身是不可设置的,所以我尝试使用下面给出的outerxml 创建一个新段落。

string modifiedOuterxml = Regex.Replace(currentParagraph.OuterXml, currentString, reusableContentString);
OpenXmlElement parent = currentParagraph.Parent;
Paragraph modifiedParagraph = new Paragraph(modifiedOuterxml);
parent.ReplaceChild<Paragraph>(modifiedParagraph, currentParagraph);

尽管我不太关心这个级别的格式并且似乎没有任何格式,但outerXML 似乎有一些额外的元素会击败正则表达式。

<代码> ..."16" />a< ;w:lang w:val="en-AU" //>专门针对非建筑工程的建筑工程的合同。

因此,总而言之,我将如何替换段落中的文本OpenXml 与其他文本。即使以丢失一些格式为代价。

I am parsing some Openxml word documents using the .Net OpenXml SDK 2.0. I need to replace certain sentences with other sentences as part of the processing. While iterating over the paragraphs, I know when I've found something I need to replace, but I am stumped as to how I can replace it.

For example, lets say I need to replace the sentence "a contract exclusively for construction work that is not building work." with a html snippet to a Sharepoint Reusable content below.

<span class="ms-rtestate-read ms-reusableTextView" contentEditable="false" id="__publishingReusableFragment" fragmentid="/Sites/Sandbox/ReusableContent/132_.000" >a contract exclusively for construction work that is not building work.</span>

PS: I got the docx to Html conversion worked out using xslt, so that is kind of not a problem at this stage

The InnerText property of the Paragraph node gives me the proper text, but the inner text property itself is not settable. So
Regex.Match(currentParagraph.InnerText, currentString).Success
returns true and tells me that the current paragraph contains the text I want.

As I said, InnerText itself is not settable, so I tried created a new paragraph using outerxml is given below.

string modifiedOuterxml = Regex.Replace(currentParagraph.OuterXml, currentString, reusableContentString);
OpenXmlElement parent = currentParagraph.Parent;
Paragraph modifiedParagraph = new Paragraph(modifiedOuterxml);
parent.ReplaceChild<Paragraph>(modifiedParagraph, currentParagraph);

Even though I am not too concerned about the formatting at this level and it doesn't seem to have any, the outerXML seems to have extra elements that defeat the regex.


..."16" /><w:lang w:val="en-AU" /></w:rPr><w:t>a</w:t></w:r><w:proofErr w:type="gramEnd" /> <w:r w:rsidRPr="00C73B58"><w:rPr><w:sz w:val="16" /><w:szCs w:val="16" /><w:lang w:val="en-AU" /></w:rPr><w:t xml:space="preserve"> contract exclusively for construction work that is not building work.</w:t></w:r></w:p>

So in summary, how would I replace the text in a Paragraph of OpenXml with other text. Even at the expense of losing some of the formatting.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

请帮我爱他 2024-10-11 17:55:31

我自己修好了。关键是删除所有运行并在当前段落中创建新运行

string modifiedString = Regex.Replace(currentParagraph.InnerText, currentString, reusableContentString);
currentParagraph.RemoveAllChildren<Run>();
currentParagraph.AppendChild<Run>(new Run(new Text(modifiedString)));

Fixed it myself. The key was to remove all the runs and create new runs in the current paragraph

string modifiedString = Regex.Replace(currentParagraph.InnerText, currentString, reusableContentString);
currentParagraph.RemoveAllChildren<Run>();
currentParagraph.AppendChild<Run>(new Run(new Text(modifiedString)));
゛时过境迁 2024-10-11 17:55:31

所有段落内部都有一个文本元素,因此您只需找到该文本元素并更新其文本,例如:

var text = part.RootElement.Descendants<Text>().FirstOrDefault(e=>e.Text == "a contract exclusively for construction work that is not building work.");
if(text != null)
{
    text.Text = "New text here";
}
mainPart.Document.Save();

All paragraphs have a text element inside so you just have to find the text element and update its text, for example:

var text = part.RootElement.Descendants<Text>().FirstOrDefault(e=>e.Text == "a contract exclusively for construction work that is not building work.");
if(text != null)
{
    text.Text = "New text here";
}
mainPart.Document.Save();
所谓喜欢 2024-10-11 17:55:31

使用 RemoveAllChildren() 然后使用 AppendChild() 确实会丢失所有样式元素,除非您花费另一大块代码将它们放回去。尼克·黄(Nick Hoang)和守门员(Goal Man)的方法更好,但没有失去任何风格。

如果您使用广泛接受的符号作为占位符(例如“#”或“|”),则替换文本效果最佳。在模板 docx 中,这样

var tag = pghBillAmount.Descendants<WordOpenXML.Text>().FirstOrDefault(p => p.Text == "#");
if (tag != null)
{
    tag.Text = order.BillAmount.ToString("C2");
}

您的粗体或突出显示样式等仍然会在那里。

Using RemoveAllChildren() and then AppendChild() will indeed lose all styling elements unless you spend another big chunk of codes putting them back. Nick Hoang's and Goal Man's approaches is better without losing any styles.

Replacing text will work best if you use a well-accepted symbol as a placeholder such as '#' or '|' in a template docx, such that

var tag = pghBillAmount.Descendants<WordOpenXML.Text>().FirstOrDefault(p => p.Text == "#");
if (tag != null)
{
    tag.Text = order.BillAmount.ToString("C2");
}

Your bold or highlight styles, etc., will still be there.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文