如何阻止 .net Xml 序列化插入非法字符

发布于 2024-12-17 15:24:00 字数 1635 浏览 1 评论 0原文

XML 文档中不能包含任何低于 0x20 的内容(0x09、0x0a、0x0d 除外,即制表符、回车符和换行符)。

我有一些数据来自数据库并作为对 Web 服务请求的响应传递。

Soap 格式化程序很乐意将 0x12 字符(Ascii 18,设备控制 2)编码为 ,但响应在客户端失败,十六进制值 0x12,是无效字符

令我感到非常沮丧的是,这是同一枚硬币的两面,客户端和服务都是 .net 应用程序。如果没有任何东西可以读取,为什么肥皂格式化程序会写出错误的 xml?

我想要么

  1. 让 Xml Serialiser 正确处理这些奇怪的字符,要么
  2. 让请求在 Web 中失败 我在谷歌上搜索过的服务

,除了a)“清理您的输入”或b)“更改您的文档结构”之外,找不到太多相关内容。

a) 不是跑步者,因为其中一些数据已有 20 多年历史
b) 也不是一个很好的选择,因为除了我们自己的前端之外,我们还有直接针对 Web 服务进行编码的客户端。

我有什么明显遗漏的东西吗?或者它只是围绕 AscII 控制代码的代码的情况?

谢谢

更新
这实际上是 XmlSerialiser 的问题,以下代码会将无效字符序列化到流中,但不会反序列化它

[Serializable]
public class MyData 
{
    public string Text { get; set; }

}
class Program
{
    public static void Main(string[] args)
    {
        var myData = new MyData {Text = "hello " 
                + ASCIIEncoding.ASCII.GetString(new byte[] { 0x12 }) 
                + " world"};

        var serializer = new XmlSerializer(typeof(MyData));

        var xmlWriter = new StringWriter();

        serializer.Serialize(xmlWriter, myData);

        var xmlReader = new StringReader(xmlWriter.ToString());

        var newData = (MyData)serializer.Deserialize(xmlReader); // Exception 
        // hexadecimal value 0x12, is an invalid character.

    }
}

我可以通过显式创建 XmlWriter 来阻止写入 xml将其传递给 Serialise (我将很快将其作为我自己的答案发布),但这仍然意味着我必须在发送数据之前对其进行清理。
由于这些字符很重要,我不能只是剥离它们,我需要在传输之前对它们进行编码,并在读取时对它们进行解码,而且我真的非常惊讶,似乎没有现有的框架方法来做到这一点。

Anything below 0x20 (except for 0x09, 0x0a, 0x0d i.e. tab, carrige return and line feed) cannot be included in an XML document.

I have some data coming out of a Database and being passed as a response to a Web Service request.

The Soap formatter happily encodes 0x12 character (Ascii 18, Device Control 2) as but the response fails on the client with hexadecimal value 0x12, is an invalid character

<rant> What I find quite frustrating is these are two sides of the same coin, both client and service are .net apps. Why will the soap formatter write bad xml if nothing can read it?</rant>

I'd like to either

  1. Get the Xml Serialiser to handle these odd characters correctly or
  2. Have the request fail in the Web Service

I've googled and couldn't find much on this other than, a) "sanitise your Inputs" or b) "change your document structure".

a) Isn't a runner as some of this data is +20 years old
b) isn't much of an option either, as other than our own front end, we have clients that code against the Web Service directly.

Is there something obvious I'm missing? Or is it simply a case of code around AscII control codes?

Thanks

Update
This is actually a problem with the XmlSerialiser, the following code will serialise an invalid character to the stream, but will not de-serialise it

[Serializable]
public class MyData 
{
    public string Text { get; set; }

}
class Program
{
    public static void Main(string[] args)
    {
        var myData = new MyData {Text = "hello " 
                + ASCIIEncoding.ASCII.GetString(new byte[] { 0x12 }) 
                + " world"};

        var serializer = new XmlSerializer(typeof(MyData));

        var xmlWriter = new StringWriter();

        serializer.Serialize(xmlWriter, myData);

        var xmlReader = new StringReader(xmlWriter.ToString());

        var newData = (MyData)serializer.Deserialize(xmlReader); // Exception 
        // hexadecimal value 0x12, is an invalid character.

    }
}

I can get it to choke writing the xml by explicitly creating an XmlWriter and passing that to Serialise (I'll post that shortly as my own answer), but that still means I've to sanatize my data before sending it.
As these characters are significant I can't just strip them, I need to encode them before transmission and decode them when read, and I'm really quite surprised that there doesn't appear to be an existing framework method to do this.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

若有似无的小暗淡 2024-12-24 15:24:00

方案

第二:使用DataContractSerializer(默认情况下用于WCF服务)而不是XmlSerializer的解决

[Serializable]
public class MyData
{
    public string Text { get; set; }
}
class Program
{
    public static void Main(string[] args)
    {
        var myData = new MyData
        {
            Text = "hello "
                + ASCIIEncoding.ASCII.GetString(new byte[] { 0x12 })
                + " world"
        };

        var serializer = new DataContractSerializer(typeof(MyData));

        var mem = new MemoryStream();

        serializer.WriteObject(mem, myData);

        mem.Seek(0, SeekOrigin.Begin);
        MyData myData2 = (MyData)serializer.ReadObject(mem);

        Console.WriteLine("myData2 {0}", myData2.Text);
    }
}

可以解决Frist< /strong> :解决方法

我可以通过使用 XmlWriter 在编写 Xml 时让它卡住,这可以说比客户端卡住它要好。例如

,但是它并没有解决发送无效字符的根本问题

[Serializable]
public class MyData 
{
    public string Text { get; set; }
}
class Program
{
    public static void Main(string[] args)
    {
        var myData = new MyData {Text = "hello " 
            + ASCIIEncoding.ASCII.GetString(new byte[] { 0x12 }) 
            + " world"};
        var serializer = new System.Xml.Serialization.XmlSerializer(typeof(MyData));

        var sw = new StringWriter();
        XmlWriterSettings settings = new XmlWriterSettings();

        using (var writer = XmlWriter.Create(sw))
        {
            serializer.Serialize(writer, myData); // Exception
            // hexadecimal value 0x12, is an invalid character
        }
        var xmlReader = new StringReader(sw.ToString());

        var newUser = (MyData)serializer.Deserialize(xmlReader);

        Console.WriteLine("User Name = {0}", newUser);

    }
}

Second : A Solution

Using the DataContractSerializer (which is used by default for WCF Services) instead of the XmlSerializer works a treat

[Serializable]
public class MyData
{
    public string Text { get; set; }
}
class Program
{
    public static void Main(string[] args)
    {
        var myData = new MyData
        {
            Text = "hello "
                + ASCIIEncoding.ASCII.GetString(new byte[] { 0x12 })
                + " world"
        };

        var serializer = new DataContractSerializer(typeof(MyData));

        var mem = new MemoryStream();

        serializer.WriteObject(mem, myData);

        mem.Seek(0, SeekOrigin.Begin);
        MyData myData2 = (MyData)serializer.ReadObject(mem);

        Console.WriteLine("myData2 {0}", myData2.Text);
    }
}

Frist : A Workaround

I can get it to choke when writing the Xml, by using an XmlWriter, which is arguably better than the client choking on it. e.g.

However it doesn't fix the underlying problem of sending the invalid characters

[Serializable]
public class MyData 
{
    public string Text { get; set; }
}
class Program
{
    public static void Main(string[] args)
    {
        var myData = new MyData {Text = "hello " 
            + ASCIIEncoding.ASCII.GetString(new byte[] { 0x12 }) 
            + " world"};
        var serializer = new System.Xml.Serialization.XmlSerializer(typeof(MyData));

        var sw = new StringWriter();
        XmlWriterSettings settings = new XmlWriterSettings();

        using (var writer = XmlWriter.Create(sw))
        {
            serializer.Serialize(writer, myData); // Exception
            // hexadecimal value 0x12, is an invalid character
        }
        var xmlReader = new StringReader(sw.ToString());

        var newUser = (MyData)serializer.Deserialize(xmlReader);

        Console.WriteLine("User Name = {0}", newUser);

    }
}
满栀 2024-12-24 15:24:00

Binary Worrier 的帖子与插入的特殊字符过滤器的组合可以很好地在返回对象之前过滤对象:

public List<MyData> MyWebServiceMethod()
{
    var mydata = GetMyData();
    return Helper.ScrubObjectOfSpecialCharacters<List<MyData>>(mydata);
}

Helper 类:

public static T ScrubObjectOfSpecialCharacters<T>(T obj)
{
    var serializer = new XmlSerializer(obj.GetType());

    using (StringWriter writer = new StringWriter())
    {
        serializer.Serialize(writer, obj);

        string content = writer.ToString();

        content = FixSpecialCharacters(content);

        using (StringReader reader = new StringReader(content))
        {
            obj = (T)serializer.Deserialize(reader);
        }
    }
    return obj;
}
public static string FixSpecialCharacters(string input)
{
    if (string.IsNullOrEmpty(input)) return input;

    StringBuilder output = new StringBuilder();
    for (int i = 0; i < input.Length; i++)
    {
        int charCode = (int)input[i];
        switch (charCode)
        {
            case 8211:
            case 8212:
                {
                    // replaces short and long hyphen
                    output.Append('-');
                    break;
                }
            default:
                {
                    if ((31 < charCode && charCode < 127) || charCode == 9)
                    {
                        output.Append(input[i]);
                    }
                    break;
                }
        }
    }
    return output.ToString();
}

A combination of Binary Worrier's post with an inserted special character filter works pretty well to filter the object right before the return of it:

public List<MyData> MyWebServiceMethod()
{
    var mydata = GetMyData();
    return Helper.ScrubObjectOfSpecialCharacters<List<MyData>>(mydata);
}

Helper class:

public static T ScrubObjectOfSpecialCharacters<T>(T obj)
{
    var serializer = new XmlSerializer(obj.GetType());

    using (StringWriter writer = new StringWriter())
    {
        serializer.Serialize(writer, obj);

        string content = writer.ToString();

        content = FixSpecialCharacters(content);

        using (StringReader reader = new StringReader(content))
        {
            obj = (T)serializer.Deserialize(reader);
        }
    }
    return obj;
}
public static string FixSpecialCharacters(string input)
{
    if (string.IsNullOrEmpty(input)) return input;

    StringBuilder output = new StringBuilder();
    for (int i = 0; i < input.Length; i++)
    {
        int charCode = (int)input[i];
        switch (charCode)
        {
            case 8211:
            case 8212:
                {
                    // replaces short and long hyphen
                    output.Append('-');
                    break;
                }
            default:
                {
                    if ((31 < charCode && charCode < 127) || charCode == 9)
                    {
                        output.Append(input[i]);
                    }
                    break;
                }
        }
    }
    return output.ToString();
}
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文