序列化/反序列化的 XML 与二进制性能
我正在开发一个紧凑的框架应用程序,需要提高性能。 该应用程序目前通过将对象序列化为 XML 并将其存储在数据库中来离线工作。 使用分析工具,我可以看到这是一个相当大的开销,减慢了应用程序的速度。 我认为如果我切换到二进制序列化,性能会提高,但由于紧凑框架不支持这一点,所以我查看了 protobuf-net。 序列化似乎更快,但反序列化要慢得多,并且应用程序执行的反序列化比序列化更多。
二进制序列化是否应该更快,如果是的话我可以采取什么措施来加快性能? 以下是我如何使用 XML 和二进制的片段:
XML 序列化:
public string Serialize(T obj)
{
UTF8Encoding encoding = new UTF8Encoding();
XmlSerializer serializer = new XmlSerializer(typeof(T));
MemoryStream stream = new MemoryStream();
XmlTextWriter writer = new XmlTextWriter(stream, Encoding.UTF8);
serializer.Serialize(stream, obj);
stream = (MemoryStream)writer.BaseStream;
return encoding.GetString(stream.ToArray(), 0, Convert.ToInt32(stream.Length));
}
public T Deserialize(string xml)
{
UTF8Encoding encoding = new UTF8Encoding();
XmlSerializer serializer = new XmlSerializer(typeof(T));
MemoryStream stream = new MemoryStream(encoding.GetBytes(xml));
return (T)serializer.Deserialize(stream);
}
Protobuf-net 二进制序列化:
public byte[] Serialize(T obj)
{
byte[] raw;
using (MemoryStream memoryStream = new MemoryStream())
{
Serializer.Serialize(memoryStream, obj);
raw = memoryStream.ToArray();
}
return raw;
}
public T Deserialize(byte[] serializedType)
{
T obj;
using (MemoryStream memoryStream = new MemoryStream(serializedType))
{
obj = Serializer.Deserialize<T>(memoryStream);
}
return obj;
}
I'm working on a compact framework application and need to boost performance. The app currently works offline by serializing objects to XML and storing them in a database. Using a profiling tool I could see this was quite a big overhead, slowing the app. I thought if I switched to a binary serialization the performance would increase, but because this is not supported in the compact framework I looked at protobuf-net. The serialization seems quicker, but deserialization much slower and the app is doing more deserializing than serializing.
Should binary serialization should be faster and if so what I can do to speed up the performance? Here's a snippet of how I'm using both XML and binary:
XML serialization:
public string Serialize(T obj)
{
UTF8Encoding encoding = new UTF8Encoding();
XmlSerializer serializer = new XmlSerializer(typeof(T));
MemoryStream stream = new MemoryStream();
XmlTextWriter writer = new XmlTextWriter(stream, Encoding.UTF8);
serializer.Serialize(stream, obj);
stream = (MemoryStream)writer.BaseStream;
return encoding.GetString(stream.ToArray(), 0, Convert.ToInt32(stream.Length));
}
public T Deserialize(string xml)
{
UTF8Encoding encoding = new UTF8Encoding();
XmlSerializer serializer = new XmlSerializer(typeof(T));
MemoryStream stream = new MemoryStream(encoding.GetBytes(xml));
return (T)serializer.Deserialize(stream);
}
Protobuf-net Binary serialization:
public byte[] Serialize(T obj)
{
byte[] raw;
using (MemoryStream memoryStream = new MemoryStream())
{
Serializer.Serialize(memoryStream, obj);
raw = memoryStream.ToArray();
}
return raw;
}
public T Deserialize(byte[] serializedType)
{
T obj;
using (MemoryStream memoryStream = new MemoryStream(serializedType))
{
obj = Serializer.Deserialize<T>(memoryStream);
}
return obj;
}
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
我要纠正自己这一点,Marc Gravall 指出第一次迭代有构建模型的开销,因此我做了一些测试,对 XML 和二进制进行了 1000 次序列化和反序列化迭代的平均值。 我首先使用 Compact Framework DLL v2 进行测试,然后使用 v3.5 DLL 进行测试。 这是我得到的,时间以毫秒为单位:
I'm going to correct myself on this, Marc Gravall pointed out the first iteration has an overhead of bulding the model so I've done some tests taking the average of 1000 iterations of serialization and deserialization for both XML and binary. I tried my tests with the v2 of the Compact Framework DLL first, and then with the v3.5 DLL. Here's what I got, time is in ms:
您的方法中的主要开销是 XmlSerializer 类的实际生成。 创建序列化程序是一个耗时的过程,您应该只为每种对象类型执行一次。 尝试缓存序列化程序,看看是否能提高性能。
遵循这个建议,我发现我的应用程序的性能有了很大的提高,这使我能够继续使用 XML 序列化。
希望这可以帮助。
The main expense in your method is the actual generation of the XmlSerializer class. Creating the serialiser is a time consuming process which you should only do once for each object type. Try caching the serialisers and see if that improves performance at all.
Following this advice I saw a large performance improvement in my app which allowed me to continute to use XML serialisation.
Hope this helps.
有趣的...想法:
Delegate.CreateDelegate
,它允许 protobuf-net 比 CF 2.0 中的属性更快地访问属性,还有许多其他事情根本无法实现CF中不存在这种情况,所以它必须在一些地方做出妥协。 对于过于复杂的模型,还有一个泛型限制的已知问题CF。 修复正在进行中,但这是一个大更改,并且需要“一段时间”。
有关信息,有关常规(完整).NET 比较各种格式(包括
XmlSerializer
和 protobuf-net)的一些指标在这里。Interesting... thoughts:
Delegate.CreateDelegate
that allows protobuf-net to access properties much faster than in can in CF 2.0FieldInfo.SetValue
There are a number of other things that simply don't exist in CF, so it has to make compromises in a few places. For overly complex models there is also a known issue with the generics limitations of CF. A fix is underway, but it is a big change, and is taking "a while".
For info, some metrics on regular (full) .NET comparing various formats (including
XmlSerializer
and protobuf-net) are here.您是否尝试过为您的类创建自定义序列化类? 而不是使用通用序列化器 XmlSerializer(它在运行时创建一堆类)。 有一个工具可以做到这一点(sgen)。 您可以在构建过程中运行它,它会生成一个可以在 XmlSerializer 中使用的自定义程序集。
如果您有 Visual Studio,则该选项位于项目属性的“生成”选项卡下。
Have you tried creating custom serialization classes for your classes? Instead of using XmlSerializer which is a general purpose serializer (it creates a bunch of classes at runtime). There's a tool for doing this (sgen). You run it during your build process and it generates a custom assembly that can be used in pace of XmlSerializer.
If you have Visual Studio, the option is available under the Build tab of your project's properties.
序列化对象或将它们写入数据库会影响性能吗? 由于编写它们可能会遇到某种缓慢的存储,因此我认为它对性能的影响比序列化步骤要大得多。
请记住,Marc Gravell 发布的性能测量正在测试超过 1,000,000 次迭代的性能。
您将它们存储在什么样的数据库中? 对象是在内存中序列化还是直接存储到存储中? 它们如何发送到数据库? 物体有多大? 当更新一个对象时,您是将所有对象发送到数据库,还是仅将已更改的对象发送到数据库? 您是否在内存中缓存了任何内容,或者每次都从存储中重新读取?
Is the performance hit in serializing the objects, or writing them to the database? Since writing them is likely hitting some kind of slow storage, I'd imagine it to be a much bigger perf hit than the serialization step.
Keep in mind that the perf measurements posted by Marc Gravell are testing the performance over 1,000,000 iterations.
What kind of database are you storing them in? Are the objects serialized in memory or straight to storage? How are they being sent to the db? How big are the objects? When one is updated, do you send all of the objects to the database, or just the one that has changed? Are you caching anything in memory at all, or re-reading from storage each time?
XML 的处理速度通常很慢并且占用大量空间。 已经有许多不同的尝试来解决这个问题,今天最流行的似乎是将所有内容放入 gzip 文件中,就像 开放包装约定。
W3C 已表明 gzip 方法并非最佳,而且它们和各种 其他小组一直在研究更好的二进制序列化,适合快速处理和压缩、传输。
XML is often slow to process and takes up a lot of space. There have been a number of different attempts to tackle this, and the most popular today seems to be to just drop the lot in a gzip file, like with the Open Packaging Convention.
The W3C has shown the gzip approach to be less than optimal, and they and various other groups have been working on a better binary serialisation suitable for fast processing and compression, for transmission.