如何防止 System.Xml.XmlDocument 转义属性值

发布于 2024-08-10 19:07:04 字数 438 浏览 3 评论 0原文

我有一个 XML 文档要处理,其中包含如下属性:

<action name="foo -> bar">

如果我做一个简单的:

XmlDocument doc = new XmlDocument();
doc.Load(stInPath);
doc.Save(stOutPath);

属性字符串被转义:

<action name="foo -&gt; bar">

这正是我想要防止的事情。

你知道有什么方法可以做到这一点(除了随后在 xml 文件上进行整个查找和替换)吗?

编辑:这似乎是一种合法行为,我不必担心这一点(参见 Jon Skeet 的回答)

I've got an XML doc to deal with that contains attributes like:

<action name="foo -> bar">

If I make a simple:

XmlDocument doc = new XmlDocument();
doc.Load(stInPath);
doc.Save(stOutPath);

The attribute string is escaped:

<action name="foo -> bar">

Which is the very thing I'd want to prevent.

Do you know any way to do this (except than making a whole find&replace on the xml file afterward)?

Edit: It seems it's a legit behaviour, and that I don't have to worry about this (see Jon Skeet's answer)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

唔猫 2024-08-17 19:07:04

为什么你需要它而不应用转义?

任何普通的解析器在解析它时都应该应用适当的“转义”。听起来您正试图将生成的 XML 文档作为纯文本文档进行测试,但这并不是一个好主意。 XML 文档几乎总是在下一步中被提供给 XML 解析器,此时这不是问题。

我不知道有什么方法可以阻止 .NET XML 库执行此操作,如果它们具有这样的功能,我会感到有些惊讶。

Why do you need it not to apply that escaping?

Any normal parser should then apply the appropriate "unescaping" when it parses it. It sounds like you're trying to test the resulting XML document as a plain-text document, which is rarely a good idea. XML documents should almost always be fed to XML parsers in the next step, at which point this isn't an issue.

I don't know of any way of preventing the .NET XML libraries from doing this, and I'd be somewhat surprised if they had such a facility.

少年亿悲伤 2024-08-17 19:07:04

这正是我想要阻止的事情。

真的吗?是否应用转义通常并不重要;两者的 XML 信息集是相同的。

坦率地说,我对文档的加载感到有点惊讶。

> 是包含在属性值中的完全有效的字符。 XML 中 > 可能需要进行 & 转义的唯一位置是在 ]]> 中由于规范中模糊且愚蠢的规则,文本内容中的顺序。

为了避免考虑这个问题,许多 XML 序列化程序习惯性地在文本内容或属性值中的任何位置转义 >

Canonical XML 规范指定了一种序列化 XML 文档的特定方法,以便输出可以是作为简单字符串进行比较;例如,它准确地说明了属性应如何排序。 Canonical XML 支持文本内容中的 > 转义,但在属性值中否认它。因此,如果您使用 Canonical XML 序列化程序来输出文档,您将获得该特定值的预期结果。 (不过,我不能保证它在其他示例中看起来像您想要的那样。)

您可以使用 XmlDsigC14NTransform (或者可能是 XmlDsigC14NWithCommentsTransform),类似:

XmlDsigC14NTransform transform= new XmlDsigC14NTransform(false);
transform.LoadInput(doc);
Stream stream= (Stream) t.GetOutput(typeof(Stream));
// write stream to file

Which is the very thing I'd want to prevent.

Really? It isn't generally important at all whether that escaping is applied; the XML infoset for either is the same.

I am frankly a bit surprised that the document loads at all.

> is a perfectly valid character to include in an attribute value. The only place > may need to be &-escaped in XML is in a ]]> sequence in text content, due to an obscure and silly rule in the spec.

To avoid having to think about the problem, many XML serialisers habitually escape > anywhere in text content or attribute values.

The Canonical XML specification specifies one particular way of serialising an XML document so the output can be compared as a simple string; for example it states exactly how attributes should be ordered. Canonical XML endorses >-escaping in text content, but it denies it in attribute values. So if you used a Canonical XML serialiser to output your document you'd get the result you expected for that particular value. (I can't guarantee it'd look how you want for other examples though.)

You can get a canonicaliser in .NET using XmlDsigC14NTransform (or maybe XmlDsigC14NWithCommentsTransform), something like:

XmlDsigC14NTransform transform= new XmlDsigC14NTransform(false);
transform.LoadInput(doc);
Stream stream= (Stream) t.GetOutput(typeof(Stream));
// write stream to file
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文