在 .NET 中使用 XmlReader 取消转义 XML 实体?
我正在尝试对 .NET (C#) 中的字符串中的 XML 实体进行转义,但我似乎无法使其正常工作。
例如,如果我有字符串 AT&T
,则应将其转换为 AT&T
。
一种方法是使用 HttpUtility.HtmlDecode(),但那是针对 HTML 的。
所以我对此有两个问题:
使用 HttpUtility.HtmlDecode() 解码 XML 实体安全吗?
如何使用 XmlReader(或类似的东西)来执行此操作?我已尝试以下操作,但总是返回空字符串:
静态字符串 ReplaceEscapes(字符串文本) { StringReader 阅读器 = new StringReader(text); XmlReaderSettings 设置 = new XmlReaderSettings(); settings.ConformanceLevel = ConformanceLevel.Fragment; 使用 (XmlReader xmlReader = XmlReader.Create(阅读器, 设置)) { 返回 xmlReader.ReadString(); } }
I'm trying to unescape XML entities in a string in .NET (C#), but I don't seem to get it to work correctly.
For example, if I have the string AT&T
, it should be translated to AT&T
.
One way is to use HttpUtility.HtmlDecode(), but that's for HTML.
So I have two questions about this:
Is it safe to use HttpUtility.HtmlDecode() for decoding XML entities?
How do I use XmlReader (or something similar) to do this? I have tried the following, but that always returns an empty string:
static string ReplaceEscapes(string text) { StringReader reader = new StringReader(text); XmlReaderSettings settings = new XmlReaderSettings(); settings.ConformanceLevel = ConformanceLevel.Fragment; using (XmlReader xmlReader = XmlReader.Create(reader, settings)) { return xmlReader.ReadString(); } }
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
HTML 转义和 XML 密切相关。正如您所说,
HttpUtility
两者都有HtmlEncode
和HtmlDecode
方法。这些也将在 XML 上运行,因为只有少数实体需要转义:<
、>
、\
、HTML 和 XML 中的 '
和&
。使用
HttpUtility
类的缺点是您需要对System.Web
dll 的引用,这还带来了许多您可能不想要的其他内容。特别是对于 XML,
SecurityElement
类具有转义
方法将进行编码,但没有相应的Unescape
方法。因此,您有几个选择:HttpUtility.HtmlDecode()
并提供对System.Web
的引用推出您自己的解码方法来处理特殊字符(因为只有少数 - 请查看 Reflector 中的
SecurityElement
的静态构造函数以查看完整列表)使用(hacky)解决方案,例如:
。
就我个人而言,如果我已经引用了
System.Web
,我会使用HttpUtility.HtmlDecode()
,如果没有,我会使用自己的引用。我不喜欢您的XmlReader
方法,因为它是Disposable
,这通常表明它正在使用需要处置的资源,因此可能是一项成本高昂的操作。HTML escaping and XML are closely related. as you have said,
HttpUtility
has bothHtmlEncode
andHtmlDecode
methods. These will also operate on XML, as there are only a few entities that need escaping:<
,>
,\
,'
and&
in both HTML and XML.The downside of using the
HttpUtility
class is that you need a reference to theSystem.Web
dll, which also brings in a lot of other stuff that you probably don't want.Specifically for XML, the
SecurityElement
class has anEscape
method that will do the encoding, but does not have a correspondingUnescape
method. You therefore have a few options:HttpUtility.HtmlDecode()
and put up with a reference toSystem.Web
roll your own decode method that takes care of the special characters (as there are only a handful - look at the static constructor of
SecurityElement
in Reflector to see the full list)use a (hacky) solution like:
.
Personally, I would use
HttpUtility.HtmlDecode()
if I already had a reference toSystem.Web
, or roll my own if not. I don't like yourXmlReader
approach as it isDisposable
, which usually indicate that it is using resources that need to be disposed, and so may be a costly operation.您的#2解决方案可以工作,但您需要在
ReadStringxmlReader.Read();
(或xmlReader.MoveToContent();
) >。我想 #1 也是可以接受的,即使存在像
®
这样的边缘情况,它是一个有效的 HTML 实体,但不是一个 XML 实体 - 你的 unescaper 应该用它做什么?作为正确的 XML 解析器抛出异常,还是像 HTML 解析器那样返回“®”?Your #2 solution can work, but you need to call
xmlReader.Read();
(orxmlReader.MoveToContent();
) prior toReadString
.I guess #1 would be also acceptable, even though there are those edge cases like
®
which is a valid HTML entity, but not an XML entity – what should your unescaper do with it? Throw an exception as a proper XML parser, or just return “®” as the HTML parser would do?这有效:
This works:
我发现如果您的输入文本以某些空白字符(例如回车符)结尾,则最上面的答案有一个小错误。
字符串“测试 ”失去了它的尾随空白。
如果将问题中的解决方案与 adrianbanks 的包装标签结合起来,您将得到以下有效的结果。
I found that the top answer has a small bug if your input text ends with certain white space characters, like carriage returns.
The string "Testing " loses it's trailing white space.
If you combine the solution in the question with adrianbanks' wrapper tag you get the following, which works.
这也有效,并且代码最少:
Update1:嗯,如果encodeString是“”,那么xtr.Read()返回false,似乎它不起作用。
Update2:添加了解决方法
Update3:这似乎工作得更好
This works as well, and has least code:
Update1: hmm, seems it does not work if encodeString is "", then xtr.Read() return false.
Update2: added workaround
Update3: this seem to work even better