大对象堆友好的 IDictionary
我们有一个应用程序,在多个字典中保存大量对象,其中一些对象在应用程序的生命周期中不断增长(具有大量工具和不断增长的订单/交易的交易应用程序)。
由于大对象堆的碎片,我们遇到了 OutOfMemoryException
问题。
为了解决这个问题,我尝试编写一个“大”字典,它作为两级字典实现,其中所有叶字典都不够大,无法在 LOH 上分配。我使用了一致的哈希算法来避免在单个存储桶变得太大时必须重新哈希整个字典。一致哈希“圆”是来自 C5 集合库的 TreeDictionary
。
我的问题是,C# 是否有更好的数据结构(或者可能是我所描述的数据结构的更好实现)?
更新
这是“大”字典的实现:https://gist.github.com/956621
我知道这并不是万无一失的,因为规范中既没有 LOH 堆阈值,也没有每个字典条目或缩放算法的大小。然而,这是目前我能想到的最好的避免应用程序在中午崩溃的方法。
We have an application that holds large numbers of objects in several Dictionary
s, some of which grow continually during the lifetime of the app (trading application with lots of instruments and continually growing orders/trades).
We are having problems with OutOfMemoryException
s due to fragmentation of the large object heap.
To counter this I've tried to write a 'large' dictionary that is implemented as a two level dictionary where all the leaf dictionaries are not large enough to be allocated on the LOH. I've used a consistent hashing algorithm to avoid having to rehash the entire dictionary when a single bucket becomes too large. The consistent hashing 'circle' is a TreeDictionary
from the C5 collections library.
My question is, are there any better data structures (or perhaps better implementations of the one I described) for C#?
Update
This is the implementation for the 'large' dictionary: https://gist.github.com/956621
I understand it's not foolproof as neither the LOH heap threshold is in the specification, nor the size of each Dictionary entry or scaling algorithm. However, this is currently the best I can think of to avoid the application blowing up mid-day.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
当字典是应用程序中最大的数据结构时,它是一种不幸的数据结构。当哈希表变得太满时,哈希表的大小通常会增加一倍,并且在调整大小期间(就在关键时刻)需要 150% 的过度分配。当哈希表很大时,它的工作效果非常好,但它需要连续分配,这会给堆算法带来压力。
您可以使用多级哈希表来消除这些缺点,例如使用哈希码的一个字节作为 256 个哈希表的索引。这肯定会增加一些开销,但更重要的是,这种策略和其他策略充满了危险,因为它会破坏随机性(例如您获得的哈希码),并可能使性能变得非常糟糕。使用这种方法需要良好的理论基础和扎实的实证检验。但它可以发挥作用。
另一种策略是为最坏的情况预先分配最大的数据结构并尽早分配。不需要细粒度的分配,但现在如果它用完,您将面临灾难性故障的幽灵。这是一个选择。
A dictionary is an unfortunate data structure when it is the largest one in your application. The hashtable is often doubled in size when it becomes too full and that requires 150% overallocation during the resizing, right at the critical time. The hashtable works superbly enough when it's gigantic but it requires consecutive allocation which stresses heap algorithms.
You can diminish these disadvantages with multi-level hashtables, for example using a byte of the hashcode as an index into 256 hashtables. This adds some overhead for sure but more importantly this and other strategies are filled with peril by fiddling with the randomness such as it of the hashcodes that you get and potentially making things much, much worse performance-wise. Using this approach requires a good theoretical foundation and solid empirical testing. But it can work.
Another strategy is to pre-allocate the biggest data-structure for the worst case and allocate it early. No fine-grained allocation necessary but now you face the spectre of catastrophic failure if it should ever run out. It's an option.
我认为这需要改变算法。
据我了解和了解,GC在内存打包和碎片整理方面相当擅长。所以你的问题源于一个简单的事实,即你在内存中保存了太多的数据。
您在内存中保存了多少数据?
您考虑过使用数据库吗?紧凑的一个可能就足够了。
或者简单地告诉您的客户,要正确运行您的应用程序,他需要 16 GB 内存。如果您的应用程序需要全部 16 GB 内存,那么肯定有问题。
编辑:
从不同的角度看你的问题,在阅读你的编辑后,我得到了一个问题:你的物体有多大?或者它们包含长列表或数组?您多久删除/添加这些对象一次?
我认为问题可能不在于字典本身,而在于太大并且被删除/添加过于频繁的对象。也许使用某种捕捉或池可能会有利可图。如果您使用列表,则使用预先分配的方式创建这些列表。
也许使用不可变结构而不是可变类可以减轻碎片。
I think this calls for change of algorithm.
From what I heard and understood, GC is quite good at packaging and defragmenting memory. So your prolem stems from simple fact, that you save too much data into the memory.
How much data do you keep in memory?
Did you thought about using database? compact one might be enough.
Or simply tell your client, that to correctly run your app, he needs 16 GB of memory. And if your app needs all those 16 GB of memory, then there is definitely something wrong.
Edit:
Looking at your problem from different side and after reading your edit I got question : How big are your objects? Or do they contain long lists or arrays? How often do you remove/add those objects?
I think problem might not be in the dictionary itself, but objects, that are too big and are removed/added too often. Maybe using some kind of catching or pool might be profitable. And if you use lists, then create those lists with prealocated.
And maybe using imutable structs instead of mutable classes can ease the fragmentation.