如何加快 C# 的 MongoDB 反序列化速度
当从查询返回许多结果时,代码需要花费很长时间才能将数据转换为 .net 对象。这些是基本对象,带有一些字符串作为字段。我不确定,但我认为它使用反射来创建实例,速度很慢。有办法加快这个速度吗?
When returning many results from a query, the code takes a really long time to convert the data into .net objects. These are basic objects, with a few strings as fields. I'm not sure but I think it's using reflection to create the instances which is slow. Is there way to speed this up?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
10gen 驱动程序不基于每个对象使用反射。它对每个类型使用一次反射,以使用 Reflection.Emit 生成序列化器,因此第一个对象的序列化或反序列化可能会很慢,但之后的任何对象都很快(相对)。
您的问题 - 有什么办法可以加快速度吗?
如果您的对象很简单(不是嵌套文档、一些公共字段等),那么您可能无能为力。您可以为该类实现一个自定义序列化器来勉强提高一点性能,但我怀疑它不会超过几个百分点。
我还没有研究过它,罗伯特·斯塔姆(Robert Stam)(他也回答了这个问题)将是这方面的权威,但是通过在驱动程序中并行化反序列化可能会在多核或多处理器系统上获得一些性能。我还没有从这个角度研究过驱动程序代码,所以这可能是罗伯特已经追求的东西。
总的来说,我认为 10 秒内 30,000 个对象对于任何平台来说都是相当标准的 - SQL、Mongo、XML 等不直接将对象存储为内存 blob 的平台(就像您可以使用 C++ 等语言一样)。
编辑:
看起来 10gen 驱动程序在返回游标供您枚举之前执行反序列化。因此,如果您的查询返回 30,000 个结果,则在驱动程序使游标可用于枚举之前,必须对所有 30,000 个对象进行反序列化。我没有看过 jmongo 驱动程序,但我希望它会做相反的事情,并将反序列化推迟到游标中枚举对象之后。
最终结果是,虽然两者可能花费相同的总时间来枚举和反序列化 30,000 个对象,但 jmongo 驱动程序中的反序列化分布在整个枚举中,而在 c# 驱动程序中它是前端加载的。
差异很微妙,但可能可以解释您所看到的内容。
坏消息是“修复”是驱动程序的更改。您可以做的一件事是将查询分成多个块,一次查询 10 或 100 个对象。
The 10gen driver doesn't use reflection on a per object basis. It uses reflection once per type to generate a serializer using Reflection.Emit, so serialization or deserialization of the first object might be slow, but any objects afterward are fast (relatively).
Your question - is there any way to speed this up?
If your objects are simple (not nested documents, a few public fields, etc.), there probably isn't much you can do. You could implement a custom serializer for the class to eke out a little performance, but I doubt it would be more than a few percent.
I haven't looked into it, and Robert Stam (who answered this question as well) would be the authority on it, but there may be some performance to be gained on multicore or multiprocessor systems by parallelizing deserialization in the driver. I haven't looked at the driver code from that perspective yet, so it may be something Robert has already pursued.
On a general note, I think 30,000 objects in 10 seconds is pretty standard for just about any platform - SQL, Mongo, XML, etc that isn't storing objects as memory blobs directly (like you could using a language like C++).
EDIT:
It looks like the 10gen driver performs deserialization before it returns a cursor for you to enumerate. So if your query returns 30,000 results, all 30,000 objects have to be deserialized before the driver makes a cursor available for enumeration. I haven't looked at the jmongo driver, but I expect that it does the opposite, and defers deserialization until after an object is enumerated in the cursor.
The net result is that while both probably take the same amount of total time to enumerate and deserialize 30,000 objects, deserialization in the jmongo driver is spread across the entire enumeration, where in the c# driver it is frontloaded.
The difference is subtle, but likely to explain what you are seeing.
The bad news is the "fix" is a driver change. One thing you could do is break your query up in chunks, querying for 10 or 100 objects at a time.
不确定你是如何测量的。当 C# 驱动程序从服务器取回一批文档时,它会立即将它们全部反序列化,因此第一个文档可能会有延迟,但其余文档的速度非常快。真正重要的是每秒文档的总吞吐量以及它是否足够快以使网络链接饱和(它应该是)。
虽然许多标准 .NET 类都有硬编码序列化器,但 POCO 的序列化通常是通过类映射来处理的。反射用于构建类映射,但在进行序列化/反序列化时不再需要反射。
您可以通过为您的类编写自己的手动编码序列化程序(或通过使您的类实现 IBsonSerialized)来加快序列化/反序列化速度,但由于瓶颈可能是网络,因此可能不值得。
Not sure how you are measuring. When the C# driver gets back a batch of documents from the server it deserializes them all at once, so there might be a lag on the first document but then the rest of the documents are really fast. What really matters is the total throughput in terms of documents per second and whether it is fast enough to saturate the network link, which it should be.
While there are hardcoded serializers for many of the standard .NET classes, serialization of POCOs is typically handled through class maps. Reflection is used to build the class maps, but reflection is no longer needed while doing the serialization/deserialization.
You could speed up serialization/deserialization a little bit by writing your own handcoded serializers for your classes (or by making your classes implement IBsonSerializable), but since the bottleneck is probably the network anyway it probably isn't worth it.
这是我正在使用的:
Here is what I am using: