SOAP 客户端无法正确处理 XML 实体;遇到“XML 文档中有错误”

发布于 2024-10-17 15:00:30 字数 3246 浏览 2 评论 0原文

我们的 WCF Web 服务的一些使用者在尝试解析我们的响应时遇到异常:

System.InvalidOperationException:XML 文档中存在错误 (5, -349)。
   在 System.Xml.Serialization.XmlSerializer.Deserialize(XmlReader xmlReader、字符串编码样式、XmlDeserializationEvents 事件)
   在 System.Xml.Serialization.XmlSerializer.Deserialize(XmlReader xmlReader,字符串编码样式)
   在System.Web.Services.Protocols.SoapHttpClientProtocol.ReadResponse(SoapClientMessage消息,WebResponse响应,流responseStream,布尔asyncCall)
   在System.Web.Services.Protocols.SoapHttpClientProtocol.Invoke(字符串方法名称,对象[]参数)
   在[消费者代码]

内部异常如下所示:

'',十六进制值 0x0B,是无效字符。第 5 行,位置 -349。

   在 System.Xml.XmlTextReaderImpl.Throw(异常 e)
   在 System.Xml.XmlTextReaderImpl.Throw(String res, String[] args)
   在 System.Xml.XmlTextReaderImpl.ThrowInvalidChar(Int32 pos, Char invChar)
   在System.Xml.XmlTextReaderImpl.ParseNumericCharRefInline(Int32 startPos,布尔展开,BufferBuilder internalSubsetBuilder,Int32&charCount,EntityType&entityType)
   在System.Xml.XmlTextReaderImpl.ParseCharRefInline(Int32 startPos,Int32&charCount,EntityType&entityType)
   在 System.Xml.XmlTextReaderImpl.ParseText(Int32&startPos、Int32&endPos、Int32&outOrChars)
   在 System.Xml.XmlTextReaderImpl.ParseText()
   在 System.Xml.XmlTextReaderImpl.ParseElementContent()
   在 System.Xml.XmlTextReaderImpl.Read()
   在 System.Xml.XmlTextReader.Read()
   在 System.Xml.XmlReader.ReadElementString()
   在 Microsoft.Xml.Serialization.GenerateAssembly.XmlSerializationReader1.Read43_TextWidgetConfig(布尔 isNullable,布尔 checkType)
   在 Microsoft.Xml.Serialization.GenerateAssembly.XmlSerializationReader1.Read45_TextWidgetInfo(布尔 isNullable,布尔 checkType)
   在 Microsoft.Xml.Serialization.GenerateAssembly.XmlSerializationReader1.Read49_WidgetInfo(布尔 isNullable,布尔 checkType)
   在 Microsoft.Xml.Serialization.GenerateAssembly.XmlSerializationReader1.Read50_InstantPageData(布尔值 isNullable,布尔值 checkType)
   在 Microsoft.Xml.Serialization.GenerateAssembly.XmlSerializationReader1.Read128_GetInstantPageDataResponse()
   在 Microsoft.Xml.Serialization.GenerateAssembly.ArrayOfObjectSerializer141.Deserialize(XmlSerializationReader 阅读器)
   在 System.Xml.Serialization.XmlSerializer.Deserialize(XmlReader xmlReader、字符串编码样式、XmlDeserializationEvents 事件)

以某种方式返回的客户数据中包含垂直制表符。查看 XML,我们可以看到这些字符被正确呈现为  实体。通过 Google 快速搜索,我们发现 XmlSerializer 存在一个错误,无法处理某些实体,必须通过更改自动生成代理的 XML 读取器中的选项来修复该错误。

消费者承认他们需要修复客户端代码,但他们无法通过补丁快速响应此问题。他们希望我们在自己的代码中应用补丁来过滤掉这些禁止的字符。

  1. XmlSerializer 的问题字符列表是否记录在任何地方?
  2. 有没有一种干净的方法可以让我们更改 WCF 服务,以便我们可以自动删除字符,而无需在所有 Web 方法中进行字符串替换?

更新:

我找到了#1 的答案。根据 XML 规范,仅允许某些字符代码:

字符 ::= #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]

所以看来我们服务器上的 DataContractSerializer 就是这里出错的地方。我现在正在研究如何自定义该序列化器。

更新 2:

看起来 DataContractSerializer 问题是已知的,并且 已登录 Microsoft Connect

Some consumers of our WCF web service are encountering an exception when trying to parse our responses:

System.InvalidOperationException: There is an error in XML document (5, -349).
   at System.Xml.Serialization.XmlSerializer.Deserialize(XmlReader xmlReader, String encodingStyle, XmlDeserializationEvents events)
   at System.Xml.Serialization.XmlSerializer.Deserialize(XmlReader xmlReader, String encodingStyle)
   at System.Web.Services.Protocols.SoapHttpClientProtocol.ReadResponse(SoapClientMessage message, WebResponse response, Stream responseStream, Boolean asyncCall)
   at System.Web.Services.Protocols.SoapHttpClientProtocol.Invoke(String methodName, Object[] parameters)
   at [Consumer's Code]

The inner exception looks like this:

'', hexadecimal value 0x0B, is an invalid character. Line 5, position -349.

   at System.Xml.XmlTextReaderImpl.Throw(Exception e)
   at System.Xml.XmlTextReaderImpl.Throw(String res, String[] args)
   at System.Xml.XmlTextReaderImpl.ThrowInvalidChar(Int32 pos, Char invChar)
   at System.Xml.XmlTextReaderImpl.ParseNumericCharRefInline(Int32 startPos, Boolean expand, BufferBuilder internalSubsetBuilder, Int32& charCount, EntityType& entityType)
   at System.Xml.XmlTextReaderImpl.ParseCharRefInline(Int32 startPos, Int32& charCount, EntityType& entityType)
   at System.Xml.XmlTextReaderImpl.ParseText(Int32& startPos, Int32& endPos, Int32& outOrChars)
   at System.Xml.XmlTextReaderImpl.ParseText()
   at System.Xml.XmlTextReaderImpl.ParseElementContent()
   at System.Xml.XmlTextReaderImpl.Read()
   at System.Xml.XmlTextReader.Read()
   at System.Xml.XmlReader.ReadElementString()
   at Microsoft.Xml.Serialization.GeneratedAssembly.XmlSerializationReader1.Read43_TextWidgetConfig(Boolean isNullable, Boolean checkType)
   at Microsoft.Xml.Serialization.GeneratedAssembly.XmlSerializationReader1.Read45_TextWidgetInfo(Boolean isNullable, Boolean checkType)
   at Microsoft.Xml.Serialization.GeneratedAssembly.XmlSerializationReader1.Read49_WidgetInfo(Boolean isNullable, Boolean checkType)
   at Microsoft.Xml.Serialization.GeneratedAssembly.XmlSerializationReader1.Read50_InstantPageData(Boolean isNullable, Boolean checkType)
   at Microsoft.Xml.Serialization.GeneratedAssembly.XmlSerializationReader1.Read128_GetInstantPageDataResponse()
   at Microsoft.Xml.Serialization.GeneratedAssembly.ArrayOfObjectSerializer141.Deserialize(XmlSerializationReader reader)
   at System.Xml.Serialization.XmlSerializer.Deserialize(XmlReader xmlReader, String encodingStyle, XmlDeserializationEvents events)

The customer's data being returned somehow had vertical tab characters in it. Looking at our XML, we could see that these characters were being properly rendered as entities. Doing a quick Google search, we found that there is a bug with XmlSerializer where it can't handle certain entities, which has to be fixed by changing an option in the the auto-generated proxies' XML Readers.

The consumer acknowledges that they need to fix their client-side code, but they are unable to quickly respond to this issue with a patch. They would like us to apply a patch in our own code to filter out these forbidden characters.

  1. Is the list of problem characters for XmlSerializer documented anywhere?
  2. Is there a clean way for us to change our WCF service so that we can automatically strip out characters without resorting to doing string replaces in all of our web methods?

Update:

I found the answer to #1. According to the XML spec, only certain character codes are allowed:

Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]

So it seems like the DataContractSerializer on our server is what's in error here. I'm looking into how to customize that serializer now.

Update 2:

It looks like the DataContractSerializer issue is known and logged in Microsoft Connect.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

水晶透心 2024-10-24 15:00:30

这是我的解决方法代码。我对此并不太高兴;它并没有涵盖所有情况(尽管它满足了我的需求),而且感觉应该有一个更简单的解决方案。我将其发布在这里,希望其他人可以做得更好或者有人有更简单的答案。

为了解决这个问题,我创建了一个新的操作行为属性,将序列化程序更改为自定义序列化程序,该序列化程序将删除将呈现为无效 XML 实体的字符:

public class StripInvalidXmlCharactersBehaviorAttribute 
    : Attribute, IOperationBehavior
{
    public void AddBindingParameters(
        OperationDescription operationDescription, 
        BindingParameterCollection bindingParameters)
    {
    }

    public void ApplyClientBehavior(
        OperationDescription operationDescription, 
        ClientOperation clientOperation)
    {
        IOperationBehavior behavior =
            new StripInvalidXmlCharactersBehavior(operationDescription);
        behavior.ApplyClientBehavior(operationDescription, clientOperation);
    }

    public void ApplyDispatchBehavior(
        OperationDescription operationDescription, 
        DispatchOperation dispatchOperation)
    {
        IOperationBehavior behavior =
            new StripInvalidXmlCharactersBehavior(operationDescription);
        behavior.ApplyDispatchBehavior(
            operationDescription, dispatchOperation);
    }

    public void Validate(OperationDescription operationDescription)
    {
    }
}

行为本身如下所示:

internal class StripInvalidXmlCharactersBehavior 
    : DataContractSerializerOperationBehavior
{
    public StripInvalidXmlCharactersBehavior(OperationDescription opDesc)
        : base(opDesc)
    {
    }

    public override XmlObjectSerializer CreateSerializer(
        Type type, string name, string ns, IList<Type> knownTypes)
    {
        return new InvalidXmlStrippingSerializer(type, name, ns, knownTypes);
    }

    public override XmlObjectSerializer CreateSerializer(
        Type type, XmlDictionaryString name, XmlDictionaryString ns, 
        IList<Type> knownTypes)
    {
        return new InvalidXmlStrippingSerializer(type, name, ns, knownTypes);
    }
}

这是序列化程序

internal class InvalidXmlStrippingSerializer : XmlObjectSerializer
{
    private DataContractSerializer _innerSerializer;

    public InvalidXmlStrippingSerializer(
        Type type, string name, string ns, IList<Type> knownTypes)
    {
        _innerSerializer = 
            new DataContractSerializer(type, name, ns, knownTypes);
    }

    public InvalidXmlStrippingSerializer(
        Type type, XmlDictionaryString name, XmlDictionaryString ns, 
        IList<Type> knownTypes)
    {
        _innerSerializer =
            new DataContractSerializer(type, name, ns, knownTypes);
    }

    public override bool IsStartObject(XmlDictionaryReader reader)
    {
        return _innerSerializer.IsStartObject(reader);
    }

    public override object ReadObject(
        XmlDictionaryReader reader, bool verifyObjectName)
    {
        return _innerSerializer.ReadObject(reader, verifyObjectName);
    }

    public override void WriteEndObject(XmlDictionaryWriter writer)
    {
        _innerSerializer.WriteEndObject(writer);
    }

    public override void WriteObjectContent(
        XmlDictionaryWriter writer, object graph)
    {
        graph = fixBadStringsRecursive(graph);
        _innerSerializer.WriteObjectContent(writer, graph);
    }

    private object fixBadStringsRecursive(object graph)
    {
        var objType = graph.GetType();
        if (objType == typeof(string))
        {
            graph = removeInvalidCharacters(graph as string);
        }
        else if (graph is IEnumerable)
        {
            foreach (var item in graph as IEnumerable)
            {
                fixBadStringsRecursive(item);
            }
        }
        else if (objType.IsClass)
        {
            // Look through the properties of the object 
            foreach (var prop in graph.GetType().GetProperties())
            {
                var propParams = prop.GetIndexParameters();
                if ((propParams == null || propParams.Length == 0)
                    && prop.GetGetMethod() != null)
                {
                    var propVal = prop.GetValue(graph, null);
                    if (propVal != null)
                    {
                        propVal = fixBadStringsRecursive(propVal);
                        if (prop.GetSetMethod() != null)
                        {
                            prop.SetValue(graph, propVal, null);
                        }
                    }
                }
            }
        }
        return graph;
    }

    private static string removeInvalidCharacters(string source)
    {
        // This is per the W3C XML spec:
        // http://www.w3.org/TR/xml/#NT-Char
        return new string(
            (
                from ch in source
                where
                    ch == '\u0009' || ch == '\u000a' || ch == '\u000d'
                    || (ch >= '\u0020' && ch <= '\ud7ff')
                    || (ch >= '\ue000' && ch <= '\ufffd')
                select ch
            ).ToArray()
        );
    }

    public override void WriteStartObject(
        XmlDictionaryWriter writer, object graph)
    {
        _innerSerializer.WriteStartObject(writer, graph);
    }
}

:行为到我的操作中,我现在可以添加我创建的属性。

Here is my workaround code. I'm not super happy about it; it doesn't cover all cases (though it takes care of my needs), and it feels like there should be an easier solution. I'll post it here with the hopes that someone else can make it better or that someone has an easier answer.

To work around the issue, I created a new operation behavior attribute to change the serializer to a custom serializer that would strip out characters that would be rendered as invalid XML entities:

public class StripInvalidXmlCharactersBehaviorAttribute 
    : Attribute, IOperationBehavior
{
    public void AddBindingParameters(
        OperationDescription operationDescription, 
        BindingParameterCollection bindingParameters)
    {
    }

    public void ApplyClientBehavior(
        OperationDescription operationDescription, 
        ClientOperation clientOperation)
    {
        IOperationBehavior behavior =
            new StripInvalidXmlCharactersBehavior(operationDescription);
        behavior.ApplyClientBehavior(operationDescription, clientOperation);
    }

    public void ApplyDispatchBehavior(
        OperationDescription operationDescription, 
        DispatchOperation dispatchOperation)
    {
        IOperationBehavior behavior =
            new StripInvalidXmlCharactersBehavior(operationDescription);
        behavior.ApplyDispatchBehavior(
            operationDescription, dispatchOperation);
    }

    public void Validate(OperationDescription operationDescription)
    {
    }
}

The behavior itself looks like this:

internal class StripInvalidXmlCharactersBehavior 
    : DataContractSerializerOperationBehavior
{
    public StripInvalidXmlCharactersBehavior(OperationDescription opDesc)
        : base(opDesc)
    {
    }

    public override XmlObjectSerializer CreateSerializer(
        Type type, string name, string ns, IList<Type> knownTypes)
    {
        return new InvalidXmlStrippingSerializer(type, name, ns, knownTypes);
    }

    public override XmlObjectSerializer CreateSerializer(
        Type type, XmlDictionaryString name, XmlDictionaryString ns, 
        IList<Type> knownTypes)
    {
        return new InvalidXmlStrippingSerializer(type, name, ns, knownTypes);
    }
}

And this is the serializer:

internal class InvalidXmlStrippingSerializer : XmlObjectSerializer
{
    private DataContractSerializer _innerSerializer;

    public InvalidXmlStrippingSerializer(
        Type type, string name, string ns, IList<Type> knownTypes)
    {
        _innerSerializer = 
            new DataContractSerializer(type, name, ns, knownTypes);
    }

    public InvalidXmlStrippingSerializer(
        Type type, XmlDictionaryString name, XmlDictionaryString ns, 
        IList<Type> knownTypes)
    {
        _innerSerializer =
            new DataContractSerializer(type, name, ns, knownTypes);
    }

    public override bool IsStartObject(XmlDictionaryReader reader)
    {
        return _innerSerializer.IsStartObject(reader);
    }

    public override object ReadObject(
        XmlDictionaryReader reader, bool verifyObjectName)
    {
        return _innerSerializer.ReadObject(reader, verifyObjectName);
    }

    public override void WriteEndObject(XmlDictionaryWriter writer)
    {
        _innerSerializer.WriteEndObject(writer);
    }

    public override void WriteObjectContent(
        XmlDictionaryWriter writer, object graph)
    {
        graph = fixBadStringsRecursive(graph);
        _innerSerializer.WriteObjectContent(writer, graph);
    }

    private object fixBadStringsRecursive(object graph)
    {
        var objType = graph.GetType();
        if (objType == typeof(string))
        {
            graph = removeInvalidCharacters(graph as string);
        }
        else if (graph is IEnumerable)
        {
            foreach (var item in graph as IEnumerable)
            {
                fixBadStringsRecursive(item);
            }
        }
        else if (objType.IsClass)
        {
            // Look through the properties of the object 
            foreach (var prop in graph.GetType().GetProperties())
            {
                var propParams = prop.GetIndexParameters();
                if ((propParams == null || propParams.Length == 0)
                    && prop.GetGetMethod() != null)
                {
                    var propVal = prop.GetValue(graph, null);
                    if (propVal != null)
                    {
                        propVal = fixBadStringsRecursive(propVal);
                        if (prop.GetSetMethod() != null)
                        {
                            prop.SetValue(graph, propVal, null);
                        }
                    }
                }
            }
        }
        return graph;
    }

    private static string removeInvalidCharacters(string source)
    {
        // This is per the W3C XML spec:
        // http://www.w3.org/TR/xml/#NT-Char
        return new string(
            (
                from ch in source
                where
                    ch == '\u0009' || ch == '\u000a' || ch == '\u000d'
                    || (ch >= '\u0020' && ch <= '\ud7ff')
                    || (ch >= '\ue000' && ch <= '\ufffd')
                select ch
            ).ToArray()
        );
    }

    public override void WriteStartObject(
        XmlDictionaryWriter writer, object graph)
    {
        _innerSerializer.WriteStartObject(writer, graph);
    }
}

To apply the behavior to my operation, I can now just add the attribute I created.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文