处理无效的 XML 十六进制字符

发布于 2024-12-16 13:36:02 字数 892 浏览 0 评论 0原文

我试图通过网络发送 XML 文档,但收到以下异常:

"MY LONG EMAIL STRING" was specified for the 'Body' element. ---> System.ArgumentException: '', hexadecimal value 0x02, is an invalid character.
   at System.Xml.XmlUtf8RawTextWriter.InvalidXmlChar(Int32 ch, Byte* pDst, Boolean entitize)
   at System.Xml.XmlUtf8RawTextWriter.WriteElementTextBlock(Char* pSrc, Char* pSrcEnd)
   at System.Xml.XmlUtf8RawTextWriter.WriteString(String text)
   at System.Xml.XmlUtf8RawTextWriterIndent.WriteString(String text)
   at System.Xml.XmlRawWriter.WriteValue(String value)
   at System.Xml.XmlWellFormedWriter.WriteValue(String value)
   at Microsoft.Exchange.WebServices.Data.EwsServiceXmlWriter.WriteValue(String value, String name)
   --- End of inner exception stack trace ---

我无法控制尝试发送的内容,因为字符串是从电子邮件中收集的。如何对字符串进行编码,使其成为有效的 XML,同时保留非法字符?

我想以某种方式保留原来的角色。

I'm trying to send an XML document over the wire but receiving the following exception:

"MY LONG EMAIL STRING" was specified for the 'Body' element. ---> System.ArgumentException: '', hexadecimal value 0x02, is an invalid character.
   at System.Xml.XmlUtf8RawTextWriter.InvalidXmlChar(Int32 ch, Byte* pDst, Boolean entitize)
   at System.Xml.XmlUtf8RawTextWriter.WriteElementTextBlock(Char* pSrc, Char* pSrcEnd)
   at System.Xml.XmlUtf8RawTextWriter.WriteString(String text)
   at System.Xml.XmlUtf8RawTextWriterIndent.WriteString(String text)
   at System.Xml.XmlRawWriter.WriteValue(String value)
   at System.Xml.XmlWellFormedWriter.WriteValue(String value)
   at Microsoft.Exchange.WebServices.Data.EwsServiceXmlWriter.WriteValue(String value, String name)
   --- End of inner exception stack trace ---

I don't have any control over what I attempt to send because the string is gathered from an email. How can I encode my string so it's valid XML while keeping the illegal characters?

I'd like to keep the original characters one way or another.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(8

转角预定愛 2024-12-23 13:36:02

以下代码从字符串中删除 XML 无效字符并返回不含这些字符的新字符串:

public static string CleanInvalidXmlChars(string text) 
{ 
     // From xml spec valid chars: 
     // #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]     
     // any Unicode character, excluding the surrogate blocks, FFFE, and FFFF. 
     string re = @"[^\x09\x0A\x0D\x20-\xD7FF\xE000-\xFFFD\x10000-x10FFFF]"; 
     return Regex.Replace(text, re, ""); 
}

The following code removes XML invalid characters from a string and returns a new string without them:

public static string CleanInvalidXmlChars(string text) 
{ 
     // From xml spec valid chars: 
     // #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]     
     // any Unicode character, excluding the surrogate blocks, FFFE, and FFFF. 
     string re = @"[^\x09\x0A\x0D\x20-\xD7FF\xE000-\xFFFD\x10000-x10FFFF]"; 
     return Regex.Replace(text, re, ""); 
}
心意如水 2024-12-23 13:36:02
byte[] toEncodeAsBytes
            = System.Text.ASCIIEncoding.ASCII.GetBytes(toEncode);
      string returnValue
            = System.Convert.ToBase64String(toEncodeAsBytes);

是这样做的一种方法

byte[] toEncodeAsBytes
            = System.Text.ASCIIEncoding.ASCII.GetBytes(toEncode);
      string returnValue
            = System.Convert.ToBase64String(toEncodeAsBytes);

is one way of doing this

十年九夏 2024-12-23 13:36:02

使用 XmlConvert.IsXmlChar 方法(自.NET Framework 4.0)

public static string RemoveInvalidXmlChars(string content)
{
   return new string(content.Where(ch => System.Xml.XmlConvert.IsXmlChar(ch)).ToArray());
}

.Net Fiddle - https://dotnetfiddle.net/v1TNus

例如,垂直制表符(\v) 对于 XML 无效,它是有效的 UTF-8,但不是有效的 XML 1.0,甚至许多库(包括 libxml2)错过它并默默输出无效的 XML。

Another way to remove incorrect XML chars in C# with using XmlConvert.IsXmlChar Method (Available since .NET Framework 4.0)

public static string RemoveInvalidXmlChars(string content)
{
   return new string(content.Where(ch => System.Xml.XmlConvert.IsXmlChar(ch)).ToArray());
}

.Net Fiddle - https://dotnetfiddle.net/v1TNus

For example, the vertical tab symbol (\v) is not valid for XML, it is valid UTF-8, but not valid XML 1.0, and even many libraries (including libxml2) miss it and silently output invalid XML.

心的憧憬 2024-12-23 13:36:02

为我工作:

XmlWriterSettings xmlWriterSettings = new XmlWriterSettings { Encoding = Encoding.UTF8, CheckCharacters = false };

Work for me:

XmlWriterSettings xmlWriterSettings = new XmlWriterSettings { Encoding = Encoding.UTF8, CheckCharacters = false };
迷路的信 2024-12-23 13:36:02

下面的解决方案删除了​​所有无效的 XML 字符,但我认为它确实是尽可能高效地完成的,特别是,它分配一个新的 StringBuilder 以及一个新字符串,而不是除非已经确定该字符串中包含任何无效字符。因此,热点最终只是字符上的一个 for 循环,检查结果通常是每个字符上不超过两个大于/小于数字的比较。如果没有找到,它只是返回原始字符串。当绝大多数字符串都可以很好地开始时,这特别有用,最好尽快将它们作为输入和输出(没有浪费的分配等)。

-- 更新 --

请参阅下面如何直接编写具有这些无效字符的 XElement,尽管它使用此代码 --

部分代码受到影响 Tom Bogle 先生的解决方案在这里。另请参阅同一条帖子中 superlogic 的帖子中的有用信息。然而,所有这些仍然总是实例化一个新的 StringBuilder 和字符串。

用法:

    string xmlStrBack = XML.ToValidXmlCharactersString("any string");

测试:

    public static void TestXmlCleanser()
    {
        string badString = "My name is Inigo Montoya"; // you may not see it, but bad char is in 'MontXoya'
        string goodString = "My name is Inigo Montoya!";

        string back1 = XML.ToValidXmlCharactersString(badString); // fixes it
        string back2 = XML.ToValidXmlCharactersString(goodString); // returns same string

        XElement x1 = new XElement("test", back1);
        XElement x2 = new XElement("test", back2);
        XElement x3WithBadString = new XElement("test", badString);

        string xml1 = x1.ToString();
        string xml2 = x2.ToString().Print();

        string xmlShouldFail = x3WithBadString.ToString();
    }

// --- 代码 --- (我在名为 XML 的静态实用程序类中有这些方法)

    /// <summary>
    /// Determines if any invalid XML 1.0 characters exist within the string,
    /// and if so it returns a new string with the invalid chars removed, else 
    /// the same string is returned (with no wasted StringBuilder allocated, etc).
    /// </summary>
    /// <param name="s">Xml string.</param>
    /// <param name="startIndex">The index to begin checking at.</param>
    public static string ToValidXmlCharactersString(string s, int startIndex = 0)
    {
        int firstInvalidChar = IndexOfFirstInvalidXMLChar(s, startIndex);
        if (firstInvalidChar < 0)
            return s;

        startIndex = firstInvalidChar;

        int len = s.Length;
        var sb = new StringBuilder(len);

        if (startIndex > 0)
            sb.Append(s, 0, startIndex);

        for (int i = startIndex; i < len; i++)
            if (IsLegalXmlChar(s[i]))
                sb.Append(s[i]);

        return sb.ToString();
    }

    /// <summary>
    /// Gets the index of the first invalid XML 1.0 character in this string, else returns -1.
    /// </summary>
    /// <param name="s">Xml string.</param>
    /// <param name="startIndex">Start index.</param>
    public static int IndexOfFirstInvalidXMLChar(string s, int startIndex = 0)
    {
        if (s != null && s.Length > 0 && startIndex < s.Length) {

            if (startIndex < 0) startIndex = 0;
            int len = s.Length;

            for (int i = startIndex; i < len; i++)
                if (!IsLegalXmlChar(s[i]))
                    return i;
        }
        return -1;
    }

    /// <summary>
    /// Indicates whether a given character is valid according to the XML 1.0 spec.
    /// This code represents an optimized version of Tom Bogle's on SO: 
    /// https://stackoverflow.com/a/13039301/264031.
    /// </summary>
    public static bool IsLegalXmlChar(char c)
    {
        if (c > 31 && c <= 55295)
            return true;
        if (c < 32)
            return c == 9 || c == 10 || c == 13;
        return (c >= 57344 && c <= 65533) || c > 65535;
        // final comparison is useful only for integral comparison, if char c -> int c, useful for utf-32 I suppose
        //c <= 1114111 */ // impossible to get a code point bigger than 1114111 because Char.ConvertToUtf32 would have thrown an exception
    }

======== ======== ========

直接写XElement.ToString

======== ======== ========

一、这个扩展方法的用法:

string result = xelem.ToStringIgnoreInvalidChars();

-- Fuller test -- -

    public static void TestXmlCleanser()
    {
        string badString = "My name is Inigo Montoya"; // you may not see it, but bad char is in 'MontXoya'

        XElement x = new XElement("test", badString);

        string xml1 = x.ToStringIgnoreInvalidChars();                               
        //result: <test>My name is Inigo Montoya</test>

        string xml2 = x.ToStringIgnoreInvalidChars(deleteInvalidChars: false);
        //result: <test>My name is Inigo Montoya</test>
    }

--- 代码 ---

    /// <summary>
    /// Writes this XML to string while allowing invalid XML chars to either be
    /// simply removed during the write process, or else encoded into entities, 
    /// instead of having an exception occur, as the standard XmlWriter.Create 
    /// XmlWriter does (which is the default writer used by XElement).
    /// </summary>
    /// <param name="xml">XElement.</param>
    /// <param name="deleteInvalidChars">True to have any invalid chars deleted, else they will be entity encoded.</param>
    /// <param name="indent">Indent setting.</param>
    /// <param name="indentChar">Indent char (leave null to use default)</param>
    public static string ToStringIgnoreInvalidChars(this XElement xml, bool deleteInvalidChars = true, bool indent = true, char? indentChar = null)
    {
        if (xml == null) return null;

        StringWriter swriter = new StringWriter();
        using (XmlTextWriterIgnoreInvalidChars writer = new XmlTextWriterIgnoreInvalidChars(swriter, deleteInvalidChars)) {

            // -- settings --
            // unfortunately writer.Settings cannot be set, is null, so we can't specify: bool newLineOnAttributes, bool omitXmlDeclaration
            writer.Formatting = indent ? Formatting.Indented : Formatting.None;

            if (indentChar != null)
                writer.IndentChar = (char)indentChar;

            // -- write --
            xml.WriteTo(writer); 
        }

        return swriter.ToString();
    }

-- 这使用以下 XmlTextWritter --

public class XmlTextWriterIgnoreInvalidChars : XmlTextWriter
{
    public bool DeleteInvalidChars { get; set; }

    public XmlTextWriterIgnoreInvalidChars(TextWriter w, bool deleteInvalidChars = true) : base(w)
    {
        DeleteInvalidChars = deleteInvalidChars;
    }

    public override void WriteString(string text)
    {
        if (text != null && DeleteInvalidChars)
            text = XML.ToValidXmlCharactersString(text);
        base.WriteString(text);
    }
}

The following solution removes any invalid XML characters, but it does so I think about as performantly as it could be done, and in particular, it does not allocate a new StringBuilder as well as a new string, not unless it is already determined that the string has any invalid characters in it. So the hot spot ends up being just a single for loop on the characters, with the check ending up being often no more than two greater than / lesser than numeric comparisons on each char. If none are found, it simply returns the original string. This is particularly helpful when the vast majority of strings are just fine to start with, it's nice to have these as in and out (with no wasted allocs etc) as quick as possible.

-- update --

See below how one can also directly write an XElement that has these invalid characters, though it uses this code --

Some of this code was influenced by Mr. Tom Bogle's solution here. See also on that same thread the helpful information in the post by superlogical. All of these, however, always instantiate a new StringBuilder and string still.

USAGE:

    string xmlStrBack = XML.ToValidXmlCharactersString("any string");

TEST:

    public static void TestXmlCleanser()
    {
        string badString = "My name is Inigo Montoya"; // you may not see it, but bad char is in 'MontXoya'
        string goodString = "My name is Inigo Montoya!";

        string back1 = XML.ToValidXmlCharactersString(badString); // fixes it
        string back2 = XML.ToValidXmlCharactersString(goodString); // returns same string

        XElement x1 = new XElement("test", back1);
        XElement x2 = new XElement("test", back2);
        XElement x3WithBadString = new XElement("test", badString);

        string xml1 = x1.ToString();
        string xml2 = x2.ToString().Print();

        string xmlShouldFail = x3WithBadString.ToString();
    }

// --- CODE --- (I have these methods in a static utility class called XML)

    /// <summary>
    /// Determines if any invalid XML 1.0 characters exist within the string,
    /// and if so it returns a new string with the invalid chars removed, else 
    /// the same string is returned (with no wasted StringBuilder allocated, etc).
    /// </summary>
    /// <param name="s">Xml string.</param>
    /// <param name="startIndex">The index to begin checking at.</param>
    public static string ToValidXmlCharactersString(string s, int startIndex = 0)
    {
        int firstInvalidChar = IndexOfFirstInvalidXMLChar(s, startIndex);
        if (firstInvalidChar < 0)
            return s;

        startIndex = firstInvalidChar;

        int len = s.Length;
        var sb = new StringBuilder(len);

        if (startIndex > 0)
            sb.Append(s, 0, startIndex);

        for (int i = startIndex; i < len; i++)
            if (IsLegalXmlChar(s[i]))
                sb.Append(s[i]);

        return sb.ToString();
    }

    /// <summary>
    /// Gets the index of the first invalid XML 1.0 character in this string, else returns -1.
    /// </summary>
    /// <param name="s">Xml string.</param>
    /// <param name="startIndex">Start index.</param>
    public static int IndexOfFirstInvalidXMLChar(string s, int startIndex = 0)
    {
        if (s != null && s.Length > 0 && startIndex < s.Length) {

            if (startIndex < 0) startIndex = 0;
            int len = s.Length;

            for (int i = startIndex; i < len; i++)
                if (!IsLegalXmlChar(s[i]))
                    return i;
        }
        return -1;
    }

    /// <summary>
    /// Indicates whether a given character is valid according to the XML 1.0 spec.
    /// This code represents an optimized version of Tom Bogle's on SO: 
    /// https://stackoverflow.com/a/13039301/264031.
    /// </summary>
    public static bool IsLegalXmlChar(char c)
    {
        if (c > 31 && c <= 55295)
            return true;
        if (c < 32)
            return c == 9 || c == 10 || c == 13;
        return (c >= 57344 && c <= 65533) || c > 65535;
        // final comparison is useful only for integral comparison, if char c -> int c, useful for utf-32 I suppose
        //c <= 1114111 */ // impossible to get a code point bigger than 1114111 because Char.ConvertToUtf32 would have thrown an exception
    }

======== ======== ========

Write XElement.ToString directly

======== ======== ========

First, the usage of this extension method:

string result = xelem.ToStringIgnoreInvalidChars();

-- Fuller test --

    public static void TestXmlCleanser()
    {
        string badString = "My name is Inigo Montoya"; // you may not see it, but bad char is in 'MontXoya'

        XElement x = new XElement("test", badString);

        string xml1 = x.ToStringIgnoreInvalidChars();                               
        //result: <test>My name is Inigo Montoya</test>

        string xml2 = x.ToStringIgnoreInvalidChars(deleteInvalidChars: false);
        //result: <test>My name is Inigo Montoya</test>
    }

--- code ---

    /// <summary>
    /// Writes this XML to string while allowing invalid XML chars to either be
    /// simply removed during the write process, or else encoded into entities, 
    /// instead of having an exception occur, as the standard XmlWriter.Create 
    /// XmlWriter does (which is the default writer used by XElement).
    /// </summary>
    /// <param name="xml">XElement.</param>
    /// <param name="deleteInvalidChars">True to have any invalid chars deleted, else they will be entity encoded.</param>
    /// <param name="indent">Indent setting.</param>
    /// <param name="indentChar">Indent char (leave null to use default)</param>
    public static string ToStringIgnoreInvalidChars(this XElement xml, bool deleteInvalidChars = true, bool indent = true, char? indentChar = null)
    {
        if (xml == null) return null;

        StringWriter swriter = new StringWriter();
        using (XmlTextWriterIgnoreInvalidChars writer = new XmlTextWriterIgnoreInvalidChars(swriter, deleteInvalidChars)) {

            // -- settings --
            // unfortunately writer.Settings cannot be set, is null, so we can't specify: bool newLineOnAttributes, bool omitXmlDeclaration
            writer.Formatting = indent ? Formatting.Indented : Formatting.None;

            if (indentChar != null)
                writer.IndentChar = (char)indentChar;

            // -- write --
            xml.WriteTo(writer); 
        }

        return swriter.ToString();
    }

-- this uses the following XmlTextWritter --

public class XmlTextWriterIgnoreInvalidChars : XmlTextWriter
{
    public bool DeleteInvalidChars { get; set; }

    public XmlTextWriterIgnoreInvalidChars(TextWriter w, bool deleteInvalidChars = true) : base(w)
    {
        DeleteInvalidChars = deleteInvalidChars;
    }

    public override void WriteString(string text)
    {
        if (text != null && DeleteInvalidChars)
            text = XML.ToValidXmlCharactersString(text);
        base.WriteString(text);
    }
}
时光磨忆 2024-12-23 13:36:02

我位于 @parapurarajkumar 解决方案的接收端,其中非法字符已正确加载到 XmlDocument 中,但在我尝试保存输出时破坏了 XmlWriter

我的上下文

我正在使用 Elmah 查看网站上的异常/错误日志。 Elmah 以大型 XML 文档的形式返回异常发生时服务器的状态。对于我们的报告引擎,我使用 XmlWriter 漂亮地打印 XML。

在网站攻击期间,我注意到某些 xml 未进行解析,并收到此 '.',十六进制值 0x00,是无效字符。 异常。

非解决方案:我将文档转换为 byte[] 并将其清除为 0x00,但没有找到任何内容。

当我扫描xml文档时,我发现了以下内容:

...
<form>
...
<item name="SomeField">
   <value
     string="C:\boot.ini�.htm" />
 </item>
...

There was the nul byte编码为html实体

解决方案:为了修复编码,我在将 值加载到 XmlDocument 之前替换了它,因为加载它会创建nul 字节,并且很难从对象中清除它。这是我的整个过程:

XmlDocument xml = new XmlDocument();
details.Xml = details.Xml.Replace("�", "[0x00]");  // in my case I wanted to see it, otherwise just replace with ""
xml.LoadXml(details.Xml);

string formattedXml = null;

// I stuff this all in a helper function, but put it in-line for this example
StringBuilder sb = new StringBuilder();
XmlWriterSettings settings = new XmlWriterSettings {
    OmitXmlDeclaration = true,
    Indent = true,
    IndentChars = "\t",
    NewLineHandling = NewLineHandling.None,
};
using (XmlWriter writer = XmlWriter.Create(sb, settings)) {
    xml.Save(writer);
    formattedXml = sb.ToString();
}

经验教训:使用关联的 html 实体清理非法字节(如果您的传入数据在输入时是 html 编码的)。

I'm on the receiving end of @parapurarajkumar's solution, where the illegal characters are being properly loaded into XmlDocument, but breaking XmlWriter when I'm trying to save the output.

My Context

I'm looking at exception/error logs from the website using Elmah. Elmah returns the state of the server at the time of the exception, in the form of a large XML document. For our reporting engine I pretty-print the XML with XmlWriter.

During a website attack, I noticed that some xmls weren't parsing and was receiving this '.', hexadecimal value 0x00, is an invalid character. exception.

NON-RESOLUTION: I converted the document to a byte[] and sanitized it of 0x00, but it found none.

When I scanned the xml document, I found the following:

...
<form>
...
<item name="SomeField">
   <value
     string="C:\boot.ini�.htm" />
 </item>
...

There was the nul byte encoded as an html entity !!!

RESOLUTION: To fix the encoding, I replaced the value before loading it into my XmlDocument, because loading it will create the nul byte and it will be difficult to sanitize it from the object. Here's my entire process:

XmlDocument xml = new XmlDocument();
details.Xml = details.Xml.Replace("�", "[0x00]");  // in my case I wanted to see it, otherwise just replace with ""
xml.LoadXml(details.Xml);

string formattedXml = null;

// I stuff this all in a helper function, but put it in-line for this example
StringBuilder sb = new StringBuilder();
XmlWriterSettings settings = new XmlWriterSettings {
    OmitXmlDeclaration = true,
    Indent = true,
    IndentChars = "\t",
    NewLineHandling = NewLineHandling.None,
};
using (XmlWriter writer = XmlWriter.Create(sb, settings)) {
    xml.Save(writer);
    formattedXml = sb.ToString();
}

LESSON LEARNED: sanitize for illegal bytes using the associated html entity, if your incoming data is html encoded on entry.

三月梨花 2024-12-23 13:36:02

有一个效果很好的通用解决方案:

public class XmlTextTransformWriter : System.Xml.XmlTextWriter
{
    public XmlTextTransformWriter(System.IO.TextWriter w) : base(w) { }
    public XmlTextTransformWriter(string filename, System.Text.Encoding encoding) : base(filename, encoding) { }
    public XmlTextTransformWriter(System.IO.Stream w, System.Text.Encoding encoding) : base(w, encoding) { }

    public Func<string, string> TextTransform = s => s;

    public override void WriteString(string text)
    {
        base.WriteString(TextTransform(text));
    }

    public override void WriteCData(string text)
    {
        base.WriteCData(TextTransform(text));
    }

    public override void WriteComment(string text)
    {
        base.WriteComment(TextTransform(text));
    }

    public override void WriteRaw(string data)
    {
        base.WriteRaw(TextTransform(data));
    }

    public override void WriteValue(string value)
    {
        base.WriteValue(TextTransform(value));
    }
}

一旦就位,您就可以创建对此的替代,如下所示:

public class XmlRemoveInvalidCharacterWriter : XmlTextTransformWriter
{
    public XmlRemoveInvalidCharacterWriter(System.IO.TextWriter w) : base(w) { SetTransform(); }
    public XmlRemoveInvalidCharacterWriter(string filename, System.Text.Encoding encoding) : base(filename, encoding) { SetTransform(); }
    public XmlRemoveInvalidCharacterWriter(System.IO.Stream w, System.Text.Encoding encoding) : base(w, encoding) { SetTransform(); }

    void SetTransform()
    {
        TextTransform = XmlUtil.RemoveInvalidXmlChars;
    }
}

其中 XmlUtil.RemoveInvalidXmlChars 定义如下:

    public static string RemoveInvalidXmlChars(string content)
    {
        if (content.Any(ch => !System.Xml.XmlConvert.IsXmlChar(ch)))
            return new string(content.Where(ch => System.Xml.XmlConvert.IsXmlChar(ch)).ToArray());
        else
            return content;
    }

There is a generic solution that works nicely:

public class XmlTextTransformWriter : System.Xml.XmlTextWriter
{
    public XmlTextTransformWriter(System.IO.TextWriter w) : base(w) { }
    public XmlTextTransformWriter(string filename, System.Text.Encoding encoding) : base(filename, encoding) { }
    public XmlTextTransformWriter(System.IO.Stream w, System.Text.Encoding encoding) : base(w, encoding) { }

    public Func<string, string> TextTransform = s => s;

    public override void WriteString(string text)
    {
        base.WriteString(TextTransform(text));
    }

    public override void WriteCData(string text)
    {
        base.WriteCData(TextTransform(text));
    }

    public override void WriteComment(string text)
    {
        base.WriteComment(TextTransform(text));
    }

    public override void WriteRaw(string data)
    {
        base.WriteRaw(TextTransform(data));
    }

    public override void WriteValue(string value)
    {
        base.WriteValue(TextTransform(value));
    }
}

Once this is in place, you can then create your override of THIS as follows:

public class XmlRemoveInvalidCharacterWriter : XmlTextTransformWriter
{
    public XmlRemoveInvalidCharacterWriter(System.IO.TextWriter w) : base(w) { SetTransform(); }
    public XmlRemoveInvalidCharacterWriter(string filename, System.Text.Encoding encoding) : base(filename, encoding) { SetTransform(); }
    public XmlRemoveInvalidCharacterWriter(System.IO.Stream w, System.Text.Encoding encoding) : base(w, encoding) { SetTransform(); }

    void SetTransform()
    {
        TextTransform = XmlUtil.RemoveInvalidXmlChars;
    }
}

where XmlUtil.RemoveInvalidXmlChars is defined as follows:

    public static string RemoveInvalidXmlChars(string content)
    {
        if (content.Any(ch => !System.Xml.XmlConvert.IsXmlChar(ch)))
            return new string(content.Where(ch => System.Xml.XmlConvert.IsXmlChar(ch)).ToArray());
        else
            return content;
    }
葬﹪忆之殇 2024-12-23 13:36:02

不能用以下方法清洁字符串吗

System.Net.WebUtility.HtmlDecode()

Can't the string be cleaned with:

System.Net.WebUtility.HtmlDecode()

?

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文