寻找一种将大量对象加载到 .NET 中的 IDictionary 中的技术

发布于 2024-12-05 01:30:53 字数 581 浏览 2 评论 0 原文

我需要将大约 600 万个对象加载到字典中。我遇到的问题是,在构造它们时,简单地将它们添加到字典中会导致内存碎片,因为字典会分配新数组并释放现有数组。最终,由于空闲内存的碎片,我只能将 200 万个加载到内存中。问题是我不知道物品的实际数量。这一切都取决于用户输入。

我不太完美的解决方案是这样的:
1. 使用链表来存储所有创建后的对象。我这样做是因为链接列表不需要连续的可用空间
2. 创建一个具有所需大小的字典,因此无需重新分配内部字典数组
3. 将对象复制到字典中。这样,我最多可以加载 300 万个

关于如何改进这个的建议吗?或者,您是否知道内部不使用数组的免费 IDictionary 实现。

谢谢

更新:我的键是固定长度的字符串,具体取决于值类型。通常长度约为 8 个字符,但最多可达 20 个字符。而且,随着密钥长度的增加,可能的项目总数会激增。幸运的是,当前最大项目数为 12M。该值是一个类类型,每个实例的总大小大约为 90-120 字节。

这是一个在 32 位 Windows 上运行的 winforms 应用程序。而且,我的典型主机有 2G 内存。消耗大量空间的应用程序存在大量内存浪费。不幸的是,我现在无法解决这些问题。

I need to load about 6 million objects into a Dictionary. The problem I have is that simply adding them to a Dictionary while constructing them fragments memory as dictionary allocates new arrays and deallocates existing ones. In the end, this way I could only load 2 millions of them into memory due to fragmentation of the free memory. The issue is that I do not know the actual number of the items. It all depends on user input.

My not so perfect solution is this:
1. Use a linked list to store all objects once they are created. I do this as linked lists do not need contiguous free space
2. Create a dictionary with the exact size needed, so no need for re-allocation of internal dictionary arrays
3. copy objects over into the dictionary. This way, I can load up to 3 million

Any suggestions on how I can improve this? Or, are you aware of a free IDictionary implementation that does not use arrays internally.

Thank you

UPDATE: My keys are strings of fixed length depending on value type. Typically about 8 chars long but can be up-to 20 chars. And, the total possible number of items explodes as the key length increases. Fortunately, the current maximum number of items is 12M. The value is a class type of roughly 90-120 bytes in total size per instance

This is a winforms application running on 32-bit windows. And, my typical host machine has 2G of memory. There is a lot of waste of memory in the application that consume a lot of space. Unfortunately, I cannot address them now.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

罪歌 2024-12-12 01:30:53

整个碎片问题可以通过使用容量来解决:

var d = new Dictionary<int, string>(expectedCapacity);

expectedCapacity应该悲观地计算,并留有一点剩余空间。

但是当将其与引用类型和/或小值类型一起使用时,这应该不会产生太大的差异。我认为你应该重新检查你的诊断。

碎片只是大型对象堆上的一个问题,600 万个 K,V 对(~ 6M * 20 = 120 MB)不应该这样做。

但一定要意识到字典是如何增长的:当它满了时,它就会翻倍。因此,当加载(稍多于)8M 的项目时,您最终可能会获得 16M 的容量,同时还可以将 8M、4M、2M 等块放置在 LOH 上。
这可能会导致 OOM。

因此,提前估计物品数量是非常值得的。

The whole fragmentation issue can be solved by using a capacity:

var d = new Dictionary<int, string>(expectedCapacity);

expectedCapacityshould be calculated pessimistically and with a little room to spare.

but when using it with reference types and/or small value types this should not make much of a difference. I think you should re-check your diagnosis.

Fragmentation is only an issue on the Large Object Heap, and 6 million K,V pairs (~ 6M * 20 = 120 MB) shouldn't do that.

But do realize how a Dictionary grows: when it's full it doubles. So when loading (a little over) 8M items you could end up with a capacity for 16M, with 8M, 4M, 2M etc blocks also placed on the LOH.
That could cause an OOM.

So it is well worth trying to estimate the number of items in advance.

忆梦 2024-12-12 01:30:53

一些分区会有帮助吗?

我使用了一种方法,即使用字典键的 GetHashCode() 的异或来计算字节哈希,将字典划分为 256 个较小的字典。基本上,您有一个内部 Dictionary>,它保存外部 IDictionary 的值。

如果您从这样的大型字典类开始:

public class LargeDictionary<K, V> : IDictionary<K, V>
{
    private readonly Dictionary<byte, Dictionary<K, V>> _inner =
            new Dictionary<byte, Dictionary<K, V>>();

    private Dictionary<K, V> GetInner(K key)
    {
        var bs = BitConverter.GetBytes(key.GetHashCode());
        var prekey = (byte)(bs[0] ^ bs[1] ^ bs[2] ^ bs[3]);
        if (!_inner.ContainsKey(prekey))
        {
            _inner.Add(prekey, new Dictionary<K, V>());
        }
        return _inner[prekey];
    }

    /* See below */

}

您是否能够从此开始并可能重建部分内部字典以回收内存?

这是班级的其余部分:

    public void Add(K key, V value)
    {
        this.GetInner(key).Add(key, value);
    }

    public bool ContainsKey(K key)
    {
        return this.GetInner(key).ContainsKey(key);
    }

    public ICollection<K> Keys
    {
        get
        {
            var keys = from pk in _inner.Keys
                       from k in _inner[pk].Keys
                       select k;
            return keys.ToList();
        }
    }

    public bool Remove(K key)
    {
        return this.GetInner(key).Remove(key);
    }

    public bool TryGetValue(K key, out V value)
    {
        return this.GetInner(key).TryGetValue(key, out value);
    }

    public ICollection<V> Values
    {
        get
        {
            var values = from pk in _inner.Keys
                         from v in _inner[pk].Values
                         select v;
            return values.ToList();
        }
    }

    public V this[K key]
    {
        get
        {
            return this.GetInner(key)[key];
        }
        set
        {
            this.GetInner(key)[key] = value;
        }
    }

    public void Add(KeyValuePair<K, V> item)
    {
        this.GetInner(item.Key).Add(item.Key, item.Value);
    }

    public void Clear()
    {
        _inner.Clear();
    }

    public bool Contains(KeyValuePair<K, V> item)
    {
        var inner = this.GetInner(item.Key);
        return inner.ContainsKey(item.Key)
            && inner[item.Key].Equals(item.Value);
    }

    public void CopyTo(KeyValuePair<K, V>[] array, int arrayIndex)
    {
        var source = this.ToArray();
        Array.Copy(source, 0, array, arrayIndex, source.Length);
    }

    public int Count
    {
        get
        {
            var counts = from pk in _inner.Keys
                         select _inner[pk].Count;
            return counts.Sum();
        }
    }

    public bool IsReadOnly
    {
        get { return false; }
    }

    public bool Remove(KeyValuePair<K, V> item)
    {
        return this.GetInner(item.Key).Remove(item.Key);
    }

    public IEnumerator<KeyValuePair<K, V>> GetEnumerator()
    {
        return _inner.Keys.SelectMany(pk => _inner[pk]).GetEnumerator();
    }

    System.Collections.IEnumerator
            System.Collections.IEnumerable.GetEnumerator()
    {
        return this.GetEnumerator();
    }

Would some partitioning help?

I've used an approach where I calculate a byte hash using an XOR of the GetHashCode() of the dictionary key to partition the dictionary into 256 smaller ones. Basically you have an internal Dictionary<byte, Dictionary<K, V>> that holds the values for the outer IDictionary<K, V>.

If you started with a large dictionary class like this:

public class LargeDictionary<K, V> : IDictionary<K, V>
{
    private readonly Dictionary<byte, Dictionary<K, V>> _inner =
            new Dictionary<byte, Dictionary<K, V>>();

    private Dictionary<K, V> GetInner(K key)
    {
        var bs = BitConverter.GetBytes(key.GetHashCode());
        var prekey = (byte)(bs[0] ^ bs[1] ^ bs[2] ^ bs[3]);
        if (!_inner.ContainsKey(prekey))
        {
            _inner.Add(prekey, new Dictionary<K, V>());
        }
        return _inner[prekey];
    }

    /* See below */

}

Would you be able to start with this and possibly rebuild parts of the inner dictionary to reclaim memory as you go?

Here's the rest of the class:

    public void Add(K key, V value)
    {
        this.GetInner(key).Add(key, value);
    }

    public bool ContainsKey(K key)
    {
        return this.GetInner(key).ContainsKey(key);
    }

    public ICollection<K> Keys
    {
        get
        {
            var keys = from pk in _inner.Keys
                       from k in _inner[pk].Keys
                       select k;
            return keys.ToList();
        }
    }

    public bool Remove(K key)
    {
        return this.GetInner(key).Remove(key);
    }

    public bool TryGetValue(K key, out V value)
    {
        return this.GetInner(key).TryGetValue(key, out value);
    }

    public ICollection<V> Values
    {
        get
        {
            var values = from pk in _inner.Keys
                         from v in _inner[pk].Values
                         select v;
            return values.ToList();
        }
    }

    public V this[K key]
    {
        get
        {
            return this.GetInner(key)[key];
        }
        set
        {
            this.GetInner(key)[key] = value;
        }
    }

    public void Add(KeyValuePair<K, V> item)
    {
        this.GetInner(item.Key).Add(item.Key, item.Value);
    }

    public void Clear()
    {
        _inner.Clear();
    }

    public bool Contains(KeyValuePair<K, V> item)
    {
        var inner = this.GetInner(item.Key);
        return inner.ContainsKey(item.Key)
            && inner[item.Key].Equals(item.Value);
    }

    public void CopyTo(KeyValuePair<K, V>[] array, int arrayIndex)
    {
        var source = this.ToArray();
        Array.Copy(source, 0, array, arrayIndex, source.Length);
    }

    public int Count
    {
        get
        {
            var counts = from pk in _inner.Keys
                         select _inner[pk].Count;
            return counts.Sum();
        }
    }

    public bool IsReadOnly
    {
        get { return false; }
    }

    public bool Remove(KeyValuePair<K, V> item)
    {
        return this.GetInner(item.Key).Remove(item.Key);
    }

    public IEnumerator<KeyValuePair<K, V>> GetEnumerator()
    {
        return _inner.Keys.SelectMany(pk => _inner[pk]).GetEnumerator();
    }

    System.Collections.IEnumerator
            System.Collections.IEnumerable.GetEnumerator()
    {
        return this.GetEnumerator();
    }
∞梦里开花 2024-12-12 01:30:53

600 万个对象听起来对于程序的内存来说很多,而且您可能不需要同时加载它们。

将其放在应用程序之外有意义吗?也许在数据库中(可能使用 SQLite 或 SQLServer Compact 等格式)?

6 million objects sounds like a lot to keep in the memory of a program, and you probably don't need them all loaded at the same time.

Would it make sense to have it outside of the application ? maybe in a database (possibly using a format like SQLite or SQLServer Compact ) ?

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文