XDocument.Save() 删除我的 
实体
我编写了一个工具来使用 C# 和 Linq-to-XML 修复一些 XML 文件(即插入一些丢失的属性/值)。该工具将现有 XML 文件加载到 XDocument 对象中。然后,它向下解析节点以插入丢失的数据。之后,它调用 XDocument.Save() 将更改保存到另一个目录。
除了一件事之外,所有这些都很好:XML 文件中文本中的任何 
 实体都将替换为新行字符。当然,该实体代表一个新行,但我需要在 XML 中保留该实体,因为另一个使用者需要它。
有没有办法保存修改后的 XDocument 而不丢失 
 实体?
谢谢。
I wrote a tool to repair some XML files (i.e., insert some attributes/values that were missing) using C# and Linq-to-XML. The tool loads an existing XML file into an XDocument object. Then, it parses down through the node to insert the missing data. After that, it calls XDocument.Save() to save the changes out to another directory.
All of that is just fine except for one thing: any entities that are in the text in the XML file are replaced with a new line character. The entity represents a new line, of course, but I need to preserve the entity in the XML because another consumer needs it in there.
Is there any way to save the modified XDocument without losing the entities?
Thank you.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
实体在 XML 中技术上称为“数字字符引用”,它们在原始文档加载到XDocument
中时得到解析。这使得您的问题很难解决,因为在加载XDocument
后,无法区分已解析的空白实体和无关紧要的空白(通常用于为纯文本查看器格式化 XML 文档)。因此,以下内容仅适用于您的文档没有任何无关紧要的空格的情况。System.Xml 库允许通过设置
XmlWriterSettings
类的NewLineHandling
属性设置为Entitize
。但是,在文本节点内,这只会将\r
实体化为,而不是将
\n
实体化为。 #xA;。
最简单的解决方案是从
XmlWriter
类派生并覆盖其WriteString
方法手动将空白字符替换为其数字字符实体。WriteString
方法也恰好是 .NET 将不允许出现在文本节点中的字符实体化的地方,例如语法标记&
、< ;
、>
,分别实体化为&
、<
、>
。由于
XmlWriter
是抽象的,我们将从XmlTextWriter
派生,以避免必须实现前一个类的所有抽象方法。这是一个快速而肮脏的实现:如果打算在生产环境中使用,您需要删除
c.ToString()
部分,因为它的效率非常低。您可以通过批处理原始text
中不包含任何您想要实体化的字符的子字符串,并将它们一起输入到单个base.WriteString
调用中来优化代码。警告:以下简单的实现将不起作用,因为基本
WriteString
方法会将任何&
字符替换为&
,从而导致\r
扩展为。
最后,要将您的
XDocument
保存到目标文件或流中,只需使用以下代码片段:希望这会有所帮助!
编辑:作为参考,这里是重写的
WriteString
方法的优化版本:The
entities are technically called “numeric character references” in XML, and they are resolved when the original document is loaded into the
XDocument
. This makes your issue problematic to solve, since there is no way of distinguishing resolved whitespace entities from insignificant whitespace (typically used for formatting XML documents for plain-text viewers) after theXDocument
has been loaded. Thus, the below only applies if your document does not have any insignificant whitespace.The
System.Xml
library allows one to preserve whitespace entities by setting theNewLineHandling
property of theXmlWriterSettings
class toEntitize
. However, within text nodes, this would only entitize\r
to, and not
\n
to.
The easiest solution is to derive from the
XmlWriter
class and override itsWriteString
method to manually replace the whitespace characters with their numeric character entities. TheWriteString
method also happens to be the place where .NET entitizes characters that are not permitted to appear in text nodes, such as the syntax markers&
,<
, and>
, which are respectively entitized to&
,<
, and>
.Since
XmlWriter
is abstract, we shall derive fromXmlTextWriter
in order to avoid having to implement all the abstract methods of the former class. Here is a quick-and-dirty implementation:If intended for use in a production environment, you’d want to do away with the
c.ToString()
part, since it’s very inefficient. You can optimize the code by batching substrings of the originaltext
that do not contain any of the characters you want to entitize, and feeding them together into a singlebase.WriteString
call.A word of warning: The following naive implementation will not work, since the base
WriteString
method would replace any&
characters with&
, thereby causing\r
to be expanded to

.Finally, to save your
XDocument
into a destination file or stream, just use the following snippet:Hope this helps!
Edit: For reference, here is an optimized version of the overridden
WriteString
method:如果您的文档包含无关紧要的空格,您希望将其与
实体区分开来,则可以使用以下(更简单)的解决方案:转换
code> 字符临时引用另一个字符(文档中尚不存在),执行 XML 处理,然后将该字符转换回输出结果中。在下面的示例中,我们将使用私有字符
U+E800
。请注意,由于
XDocument
将数字字符引用解析为相应的 Unicode 字符,因此""
实体将被解析为'\uE800'
在输出中。通常,您可以安全地使用 Unicode“专用区域”(
U+E000
–U+F8FF
) 中的任何代码点。如果您想更加安全,请检查该字符是否已存在于文档中;如果是这样,请从上述范围中选择另一个字符。由于您只是暂时在内部使用该角色,因此使用哪一个并不重要。在极不可能的情况下,所有专用字符都已存在于文档中,则抛出异常;然而,我怀疑这在实践中是否会发生。If your document contains insignificant whitespace which you want to distinguish from your
entities, you can use the following (much simpler) solution: Convert the
character references temporarily to another character (that is not already present in your document), perform your XML processing, and then convert the character back in the output result. In the example below, we shall use the private character
U+E800
.Note that, since
XDocument
resolves numeric character references to their corresponding Unicode characters, the""
entities would have been resolved to'\uE800'
in the output.Typically, you can safely use any codepoint from the Unicode’s “Private Use Area” (
U+E000
–U+F8FF
). If you want to be extra safe, perform a check that the character is not already present in the document; if so, pick another character from the said range. Since you’ll only be using the character temporarily and internally, it does not matter which one you use. In the very unlikely scenario that all private use characters are already present in the document, throw an exception; however, I doubt that that will ever happen in practice.