SOAP 客户端无法正确处理 XML 实体;遇到“XML 文档中有错误”
我们的 WCF Web 服务的一些使用者在尝试解析我们的响应时遇到异常:
System.InvalidOperationException:XML 文档中存在错误 (5, -349)。 在 System.Xml.Serialization.XmlSerializer.Deserialize(XmlReader xmlReader、字符串编码样式、XmlDeserializationEvents 事件) 在 System.Xml.Serialization.XmlSerializer.Deserialize(XmlReader xmlReader,字符串编码样式) 在System.Web.Services.Protocols.SoapHttpClientProtocol.ReadResponse(SoapClientMessage消息,WebResponse响应,流responseStream,布尔asyncCall) 在System.Web.Services.Protocols.SoapHttpClientProtocol.Invoke(字符串方法名称,对象[]参数) 在[消费者代码]
内部异常如下所示:
'',十六进制值 0x0B,是无效字符。第 5 行,位置 -349。 在 System.Xml.XmlTextReaderImpl.Throw(异常 e) 在 System.Xml.XmlTextReaderImpl.Throw(String res, String[] args) 在 System.Xml.XmlTextReaderImpl.ThrowInvalidChar(Int32 pos, Char invChar) 在System.Xml.XmlTextReaderImpl.ParseNumericCharRefInline(Int32 startPos,布尔展开,BufferBuilder internalSubsetBuilder,Int32&charCount,EntityType&entityType) 在System.Xml.XmlTextReaderImpl.ParseCharRefInline(Int32 startPos,Int32&charCount,EntityType&entityType) 在 System.Xml.XmlTextReaderImpl.ParseText(Int32&startPos、Int32&endPos、Int32&outOrChars) 在 System.Xml.XmlTextReaderImpl.ParseText() 在 System.Xml.XmlTextReaderImpl.ParseElementContent() 在 System.Xml.XmlTextReaderImpl.Read() 在 System.Xml.XmlTextReader.Read() 在 System.Xml.XmlReader.ReadElementString() 在 Microsoft.Xml.Serialization.GenerateAssembly.XmlSerializationReader1.Read43_TextWidgetConfig(布尔 isNullable,布尔 checkType) 在 Microsoft.Xml.Serialization.GenerateAssembly.XmlSerializationReader1.Read45_TextWidgetInfo(布尔 isNullable,布尔 checkType) 在 Microsoft.Xml.Serialization.GenerateAssembly.XmlSerializationReader1.Read49_WidgetInfo(布尔 isNullable,布尔 checkType) 在 Microsoft.Xml.Serialization.GenerateAssembly.XmlSerializationReader1.Read50_InstantPageData(布尔值 isNullable,布尔值 checkType) 在 Microsoft.Xml.Serialization.GenerateAssembly.XmlSerializationReader1.Read128_GetInstantPageDataResponse() 在 Microsoft.Xml.Serialization.GenerateAssembly.ArrayOfObjectSerializer141.Deserialize(XmlSerializationReader 阅读器) 在 System.Xml.Serialization.XmlSerializer.Deserialize(XmlReader xmlReader、字符串编码样式、XmlDeserializationEvents 事件)
以某种方式返回的客户数据中包含垂直制表符。查看 XML,我们可以看到这些字符被正确呈现为 
实体。通过 Google 快速搜索,我们发现 XmlSerializer
存在一个错误,无法处理某些实体,必须通过更改自动生成代理的 XML 读取器中的选项来修复该错误。
消费者承认他们需要修复客户端代码,但他们无法通过补丁快速响应此问题。他们希望我们在自己的代码中应用补丁来过滤掉这些禁止的字符。
XmlSerializer
的问题字符列表是否记录在任何地方?- 有没有一种干净的方法可以让我们更改 WCF 服务,以便我们可以自动删除字符,而无需在所有 Web 方法中进行字符串替换?
更新:
我找到了#1 的答案。根据 XML 规范,仅允许某些字符代码:
字符 ::= #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]
所以看来我们服务器上的 DataContractSerializer
就是这里出错的地方。我现在正在研究如何自定义该序列化器。
更新 2:
看起来 DataContractSerializer
问题是已知的,并且 已登录 Microsoft Connect。
Some consumers of our WCF web service are encountering an exception when trying to parse our responses:
System.InvalidOperationException: There is an error in XML document (5, -349). at System.Xml.Serialization.XmlSerializer.Deserialize(XmlReader xmlReader, String encodingStyle, XmlDeserializationEvents events) at System.Xml.Serialization.XmlSerializer.Deserialize(XmlReader xmlReader, String encodingStyle) at System.Web.Services.Protocols.SoapHttpClientProtocol.ReadResponse(SoapClientMessage message, WebResponse response, Stream responseStream, Boolean asyncCall) at System.Web.Services.Protocols.SoapHttpClientProtocol.Invoke(String methodName, Object[] parameters) at [Consumer's Code]
The inner exception looks like this:
'', hexadecimal value 0x0B, is an invalid character. Line 5, position -349. at System.Xml.XmlTextReaderImpl.Throw(Exception e) at System.Xml.XmlTextReaderImpl.Throw(String res, String[] args) at System.Xml.XmlTextReaderImpl.ThrowInvalidChar(Int32 pos, Char invChar) at System.Xml.XmlTextReaderImpl.ParseNumericCharRefInline(Int32 startPos, Boolean expand, BufferBuilder internalSubsetBuilder, Int32& charCount, EntityType& entityType) at System.Xml.XmlTextReaderImpl.ParseCharRefInline(Int32 startPos, Int32& charCount, EntityType& entityType) at System.Xml.XmlTextReaderImpl.ParseText(Int32& startPos, Int32& endPos, Int32& outOrChars) at System.Xml.XmlTextReaderImpl.ParseText() at System.Xml.XmlTextReaderImpl.ParseElementContent() at System.Xml.XmlTextReaderImpl.Read() at System.Xml.XmlTextReader.Read() at System.Xml.XmlReader.ReadElementString() at Microsoft.Xml.Serialization.GeneratedAssembly.XmlSerializationReader1.Read43_TextWidgetConfig(Boolean isNullable, Boolean checkType) at Microsoft.Xml.Serialization.GeneratedAssembly.XmlSerializationReader1.Read45_TextWidgetInfo(Boolean isNullable, Boolean checkType) at Microsoft.Xml.Serialization.GeneratedAssembly.XmlSerializationReader1.Read49_WidgetInfo(Boolean isNullable, Boolean checkType) at Microsoft.Xml.Serialization.GeneratedAssembly.XmlSerializationReader1.Read50_InstantPageData(Boolean isNullable, Boolean checkType) at Microsoft.Xml.Serialization.GeneratedAssembly.XmlSerializationReader1.Read128_GetInstantPageDataResponse() at Microsoft.Xml.Serialization.GeneratedAssembly.ArrayOfObjectSerializer141.Deserialize(XmlSerializationReader reader) at System.Xml.Serialization.XmlSerializer.Deserialize(XmlReader xmlReader, String encodingStyle, XmlDeserializationEvents events)
The customer's data being returned somehow had vertical tab characters in it. Looking at our XML, we could see that these characters were being properly rendered as entities. Doing a quick Google search, we found that there is a bug with
XmlSerializer
where it can't handle certain entities, which has to be fixed by changing an option in the the auto-generated proxies' XML Readers.
The consumer acknowledges that they need to fix their client-side code, but they are unable to quickly respond to this issue with a patch. They would like us to apply a patch in our own code to filter out these forbidden characters.
- Is the list of problem characters for
XmlSerializer
documented anywhere? - Is there a clean way for us to change our WCF service so that we can automatically strip out characters without resorting to doing string replaces in all of our web methods?
Update:
I found the answer to #1. According to the XML spec, only certain character codes are allowed:
Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]
So it seems like the DataContractSerializer
on our server is what's in error here. I'm looking into how to customize that serializer now.
Update 2:
It looks like the DataContractSerializer
issue is known and logged in Microsoft Connect.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
这是我的解决方法代码。我对此并不太高兴;它并没有涵盖所有情况(尽管它满足了我的需求),而且感觉应该有一个更简单的解决方案。我将其发布在这里,希望其他人可以做得更好或者有人有更简单的答案。
为了解决这个问题,我创建了一个新的操作行为属性,将序列化程序更改为自定义序列化程序,该序列化程序将删除将呈现为无效 XML 实体的字符:
行为本身如下所示:
这是序列化程序
:行为到我的操作中,我现在可以添加我创建的属性。
Here is my workaround code. I'm not super happy about it; it doesn't cover all cases (though it takes care of my needs), and it feels like there should be an easier solution. I'll post it here with the hopes that someone else can make it better or that someone has an easier answer.
To work around the issue, I created a new operation behavior attribute to change the serializer to a custom serializer that would strip out characters that would be rendered as invalid XML entities:
The behavior itself looks like this:
And this is the serializer:
To apply the behavior to my operation, I can now just add the attribute I created.