C# 从文件序列化数据契约
我有一个 Xml 消息列表,特别是我记录到文件中的 DataContract 消息。我正在尝试将它们从文件中一一反序列化。我不想立即将整个文件读入内存,因为我预计它会很大。
我有这个序列化的实现并且有效。我通过使用 FileStream 进行序列化并读取字节并使用正则表达式来确定元素的结尾来完成此操作。然后获取该元素并使用 DataContractSerializer 来获取实际对象。
但我被告知我应该使用更高级别的代码来完成这项任务,看起来这应该是可能的。我有以下代码,我认为应该可以工作,但事实并非如此。
FileStream readStream = File.OpenRead(filename);
DataContractSerializer ds = new DataContractSerializer(typeof(MessageType));
MessageType msg;
while ((msg = (MessageType)ds.ReadObject(readStream)) != null)
{
Console.WriteLine("Test " + msg.Property1);
}
上面的代码提供了一个包含以下内容的输入文件:
<MessageType>....</MessageType>
<MessageType>....</MessageType>
<MessageType>....</MessageType>
看来我可以正确读取并反序列化第一个元素,但之后它失败说:
System.Runtime.Serialization.SerializationException was unhandled
Message=There was an error deserializing the object of type MessageType. The data at the root level is invalid. Line 1, position 1.
Source=System.Runtime.Serialization
我在某处读到,这是由于 DataContractSerializer 与填充的工作方式有关'\0' 到最后 - 但我无法弄清楚如何在从流读取时解决这个问题,而没有以其他方式弄清楚 MessageType 标记的结尾。我应该使用另一个序列化类吗?或者也许有解决这个问题的方法?
谢谢!
I have a list of Xml messages specifically DataContract messages that i record to a file. And i am trying to deserialize them from file one by one. I do not want to read the whole file into memory at once because i expect it to be very big.
I have an implementation of this serialization and that works. I did this by serializing using a FileStream and reading the bytes and using regular expression to determine the end of element. Then taking the element and using DataContractSerializer to get the actual object.
But i was told I should be using higher level code to do this task and it seems like that should be possible. I have the following code that i think should work but it doesn't.
FileStream readStream = File.OpenRead(filename);
DataContractSerializer ds = new DataContractSerializer(typeof(MessageType));
MessageType msg;
while ((msg = (MessageType)ds.ReadObject(readStream)) != null)
{
Console.WriteLine("Test " + msg.Property1);
}
The above code is fed with an input file containing something along the following lines:
<MessageType>....</MessageType>
<MessageType>....</MessageType>
<MessageType>....</MessageType>
It appears that i can read and deserialize the first element correctly but after that it fails saying:
System.Runtime.Serialization.SerializationException was unhandled
Message=There was an error deserializing the object of type MessageType. The data at the root level is invalid. Line 1, position 1.
Source=System.Runtime.Serialization
I have read somewhere that it is due to the way DataContractSerializer works with padded '\0''s to the end - but i couldn't figure out how to fix this problem when reading from a stream without figuring out the end of MessageType tag in some other way. Is there another Serialization class that i should be using? or perhaps a way around this problem?
Thanks!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
当您从文件中反序列化数据时,WCF 默认情况下使用只能使用正确的 XML 文档的读取器。您正在阅读的文档不是 - 它包含多个根元素,因此它实际上是一个片段。您可以通过使用
ReadObject
的另一种重载(如下例所示)将序列化程序使用的读取器更改为接受片段的读取器(通过使用XmlReaderSettings
对象)。或者,您可以在
元素周围放置某种包装元素,然后您将一直阅读,直到阅读器位于包装器的末尾元素处。When you're deserializing the data from the file, WCF uses by default a reader which can only consume proper XML documents. The document which you're reading isn't - it contains multiple root elements, so it's effectively a fragment. You can change the reader the serializer is using by using another overload of
ReadObject
, as shown in the example below, to one which accepts fragments (by using theXmlReaderSettings
object). Or you can have some sort of wrapping element around the<MessageType>
elements, and you'd read until the reader were positioned at the end element for the wrapper.也许您的文件包含 BOM
UTF-8编码很常见
Maybe your file contains BOM
It's common for UTF-8 encoding