在 .NET 中使用 XmlReader 取消转义 XML 实体?

发布于 2024-10-22 20:23:56 字数 667 浏览 6 评论 0原文

我正在尝试对 .NET (C#) 中的字符串中的 XML 实体进行转义,但我似乎无法使其正常工作。

例如,如果我有字符串 AT&T,则应将其转换为 AT&T

一种方法是使用 HttpUtility.HtmlDecode(),但那是针对 HTML 的。

所以我对此有两个问题:

  1. 使用 HttpUtility.HtmlDecode() 解码 XML 实体安全吗?

  2. 如何使用 XmlReader(或类似的东西)来执行此操作?我已尝试以下操作,但总是返回空字符串:

    静态字符串 ReplaceEscapes(字符串文本)
    {
        StringReader 阅读器 = new StringReader(text);
    
        XmlReaderSettings 设置 = new XmlReaderSettings();
    
        settings.ConformanceLevel = ConformanceLevel.Fragment;
    
        使用 (XmlReader xmlReader = XmlReader.Create(阅读器, 设置))
        {
            返回 xmlReader.ReadString();
        }
    }
    

I'm trying to unescape XML entities in a string in .NET (C#), but I don't seem to get it to work correctly.

For example, if I have the string AT&T, it should be translated to AT&T.

One way is to use HttpUtility.HtmlDecode(), but that's for HTML.

So I have two questions about this:

  1. Is it safe to use HttpUtility.HtmlDecode() for decoding XML entities?

  2. How do I use XmlReader (or something similar) to do this? I have tried the following, but that always returns an empty string:

    static string ReplaceEscapes(string text)
    {
        StringReader reader = new StringReader(text);
    
        XmlReaderSettings settings = new XmlReaderSettings();
    
        settings.ConformanceLevel = ConformanceLevel.Fragment;
    
        using (XmlReader xmlReader = XmlReader.Create(reader, settings))
        {
            return xmlReader.ReadString();
        }
    }
    

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

≈。彩虹 2024-10-29 20:23:56

HTML 转义和 XML 密切相关。正如您所说, HttpUtility 两者都有HtmlEncodeHtmlDecode 方法。这些也将在 XML 上运行,因为只有少数实体需要转义:<>\HTML 和 XML 中的 '&

使用 HttpUtility 类的缺点是您需要对 System.Web dll 的引用,这还带来了许多您可能不想要的其他内容。

特别是对于 XML,SecurityElement 类具有转义方法将进行编码,但没有相应的 Unescape 方法。因此,您有几个选择:

  1. 使用 HttpUtility.HtmlDecode() 并提供对 System.Web 的引用
  2. 推出您自己的解码方法来处理特殊字符(因为只有少数 - 请查看 Reflector 中的 SecurityElement 的静态构造函数以查看完整列表)

  3. 使用(hacky)解决方案,例如:

    public static string Unescape(string text)
    {
        XmlDocument doc = new XmlDocument();
        string xml = string.Format("<dummy>{0}</dummy>", text);
        doc.LoadXml(xml);
        return doc.DocumentElement.InnerText;
    }

就我个人而言,如果我已经引用了 System.Web,我会使用 HttpUtility.HtmlDecode(),如果没有,我会使用自己的引用。我不喜欢您的 XmlReader 方法,因为它是 Disposable,这通常表明它正在使用需要处置的资源,因此可能是一项成本高昂的操作。

HTML escaping and XML are closely related. as you have said, HttpUtility has both HtmlEncode and HtmlDecode methods. These will also operate on XML, as there are only a few entities that need escaping: <,>,\,' and & in both HTML and XML.

The downside of using the HttpUtility class is that you need a reference to the System.Web dll, which also brings in a lot of other stuff that you probably don't want.

Specifically for XML, the SecurityElement class has an Escape method that will do the encoding, but does not have a corresponding Unescape method. You therefore have a few options:

  1. use the HttpUtility.HtmlDecode() and put up with a reference to System.Web
  2. roll your own decode method that takes care of the special characters (as there are only a handful - look at the static constructor of SecurityElement in Reflector to see the full list)

  3. use a (hacky) solution like:

.

    public static string Unescape(string text)
    {
        XmlDocument doc = new XmlDocument();
        string xml = string.Format("<dummy>{0}</dummy>", text);
        doc.LoadXml(xml);
        return doc.DocumentElement.InnerText;
    }

Personally, I would use HttpUtility.HtmlDecode() if I already had a reference to System.Web, or roll my own if not. I don't like your XmlReader approach as it is Disposable, which usually indicate that it is using resources that need to be disposed, and so may be a costly operation.

沙沙粒小 2024-10-29 20:23:56

您的#2解决方案可以工作,但您需要在ReadStringxmlReader.Read();(或xmlReader.MoveToContent();) >。

我想 #1 也是可以接受的,即使存在像 ® 这样的边缘情况,它是一个有效的 HTML 实体,但不是一个 XML 实体 - 你的 unescaper 应该用它做什么?作为正确的 XML 解析器抛出异常,还是像 HTML 解析器那样返回“®”?

Your #2 solution can work, but you need to call xmlReader.Read(); (or xmlReader.MoveToContent();) prior to ReadString.

I guess #1 would be also acceptable, even though there are those edge cases like ® which is a valid HTML entity, but not an XML entity – what should your unescaper do with it? Throw an exception as a proper XML parser, or just return “®” as the HTML parser would do?

梦旅人picnic 2024-10-29 20:23:56

这有效:

using (XmlReader xmlReader = XmlReader.Create(reader, settings))
{
    if (xmlReader.Read())
    {
       return xmlReader.ReadString();
    }
}

This works:

using (XmlReader xmlReader = XmlReader.Create(reader, settings))
{
    if (xmlReader.Read())
    {
       return xmlReader.ReadString();
    }
}
海夕 2024-10-29 20:23:56

我发现如果您的输入文本以某些空白字符(例如回车符)结尾,则最上面的答案有一个小错误。

字符串“测试 ”失去了它的尾随空白。

如果将问题中的解决方案与 adrianbanks 的包装标签结合起来,您将得到以下有效的结果。

public static string UnescapeUnicode(string line)
    {
        using (StringReader reader = new StringReader("<a>" + line + "</a>"))
        {
            using (XmlReader xmlReader = XmlReader.Create(reader))
            {
                xmlReader.MoveToContent();
                return xmlReader.ReadElementContentAsString();
            }
        }
    }

I found that the top answer has a small bug if your input text ends with certain white space characters, like carriage returns.

The string "Testing " loses it's trailing white space.

If you combine the solution in the question with adrianbanks' wrapper tag you get the following, which works.

public static string UnescapeUnicode(string line)
    {
        using (StringReader reader = new StringReader("<a>" + line + "</a>"))
        {
            using (XmlReader xmlReader = XmlReader.Create(reader))
            {
                xmlReader.MoveToContent();
                return xmlReader.ReadElementContentAsString();
            }
        }
    }
归属感 2024-10-29 20:23:56

这也有效,并且代码最少:

    public static string DecodeString(string encodedString)
    {
        if (string.IsNullOrEmpty(formattedText))
            return string.Empty;
        XmlTextReader xtr = new XmlTextReader(encodedString, XmlNodeType.Element, null);
        if (xtr.Read())
            return xtr.ReadString();
        throw new Exception("Error decoding xml string : " + encodedString);
    }

Update1:​​嗯,如果encodeString是“”,那么xtr.Read()返回false,似乎它不起作用。

Update2:添加了解决方法

Update3:这似乎工作得更好

    public static string DecodeString(string encodedString)
    {
        XmlTextReader xtr = new XmlTextReader(encodedString, XmlNodeType.Element, null);
        xtr.MoveToContent();
        return xtr.Value;
    }

This works as well, and has least code:

    public static string DecodeString(string encodedString)
    {
        if (string.IsNullOrEmpty(formattedText))
            return string.Empty;
        XmlTextReader xtr = new XmlTextReader(encodedString, XmlNodeType.Element, null);
        if (xtr.Read())
            return xtr.ReadString();
        throw new Exception("Error decoding xml string : " + encodedString);
    }

Update1: hmm, seems it does not work if encodeString is "", then xtr.Read() return false.

Update2: added workaround

Update3: this seem to work even better

    public static string DecodeString(string encodedString)
    {
        XmlTextReader xtr = new XmlTextReader(encodedString, XmlNodeType.Element, null);
        xtr.MoveToContent();
        return xtr.Value;
    }
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文