如何使我的简单 .NET LRU 缓存更快?

发布于 2024-07-14 09:46:35 字数 3198 浏览 6 评论 0原文

昨晚和今晚,我尝试了几种不同的方法,并提出了一种与 Jeff 所提出的方法类似的方法(我什至已经按照他在更新中建议的方法进行了操作,并整合了我自己的简单 LL 实现以获得额外的收益)。 这是代码,此时它看起来不再特别干净,但我已经无数次地改变了我能做的任何事情来增强性能。

public class NewLRU2<K, V> where V : class
{
    int m_iMaxItems;
    Dictionary<K, LRUNode<K, V>> m_oMainDict;

    private LRUNode<K,V> m_oHead;
    private LRUNode<K,V> m_oTail;
    private LRUNode<K,V> m_oCurrent;

    public NewLRU2(int iSize)
    {
        m_iMaxItems = iSize;
        m_oMainDict = new Dictionary<K, LRUNode<K,V>>();

        m_oHead = null;
        m_oTail = null;
    }

    public V this[K key]
    {
        get
        {
            m_oCurrent = m_oMainDict[key];

            if (m_oCurrent == m_oHead)
            {
                //do nothing
            }
            else if (m_oCurrent == m_oTail)
            {
                m_oTail = m_oCurrent.Next;
                m_oTail.Prev = null;

                m_oHead.Next = m_oCurrent;
                m_oCurrent.Prev = m_oHead;
                m_oCurrent.Next = null;
                m_oHead = m_oCurrent;
            }
            else
            {
                m_oCurrent.Prev.Next = m_oCurrent.Next;
                m_oCurrent.Next.Prev = m_oCurrent.Prev;

                m_oHead.Next = m_oCurrent;
                m_oCurrent.Prev = m_oHead;
                m_oCurrent.Next = null;
                m_oHead = m_oCurrent;
            }
            
            return m_oCurrent.Value;
        }
    }

    public void Add(K key, V value)
    {
        if (m_oMainDict.Count >= m_iMaxItems)
        {   
            //remove old
            m_oMainDict.Remove(m_oTail.Key);

            //reuse old
            LRUNode<K, V> oNewNode = m_oTail;
            oNewNode.Key = key;
            oNewNode.Value = value;
            
            m_oTail = m_oTail.Next;
            m_oTail.Prev = null;

            //add new
            m_oHead.Next = oNewNode;
            oNewNode.Prev = m_oHead;
            oNewNode.Next = null;
            m_oHead = oNewNode;
            m_oMainDict.Add(key, oNewNode);
        }
        else
        {
            LRUNode<K, V> oNewNode = new LRUNode<K, V>(key, value);
            if (m_oHead == null)
            {
                m_oHead = oNewNode;
                m_oTail = oNewNode;
            }
            else
            {
                m_oHead.Next = oNewNode;
                oNewNode.Prev = m_oHead;
                m_oHead = oNewNode;
            }
            m_oMainDict.Add(key, oNewNode);
        }
    }

    public bool Contains(K key)
    {
        return m_oMainDict.ContainsKey(key);
    }
}


internal class LRUNode<K,V>
{
    public LRUNode(K key, V val)
    {
        Key = key;
        Value = val;
    }

    public K Key;
    public V Value;
    public LRUNode<K, V> Next;
    public LRUNode<K, V> Prev;
}

有一些部分看起来/感觉很奇怪——比如在执行添加时重用旧节点——但我能够从它们中获得可观的性能提升。 我也对从节点上的实际属性切换到公共变量所产生的差异感到有点惊讶,但我想这就是它与这些东西的关系。 此时,上面的代码几乎完全受到字典操作的性能限制,所以我不确定是否能从混搭中得到更多。 我将继续思考这个问题并研究一些回应。

原帖的解释: 大家好。 因此,我编写了一个简单的轻量级 LRU 实现,用于压缩库(我使用它基于哈希、LZW 样式在输入中查找匹配的字节字符串),并且我正在寻找方法它更快。

Last night and tonight I tried a few different approaches and came up with one similar to the one laid out below by Jeff (I had even already done what he suggested in his update, and put together my own simple LL implementation for additional gains). Here is the code, at this point it doesn't look particularily clean anymore, but I have been over this numerous times changing anything I could to beef up performance.

public class NewLRU2<K, V> where V : class
{
    int m_iMaxItems;
    Dictionary<K, LRUNode<K, V>> m_oMainDict;

    private LRUNode<K,V> m_oHead;
    private LRUNode<K,V> m_oTail;
    private LRUNode<K,V> m_oCurrent;

    public NewLRU2(int iSize)
    {
        m_iMaxItems = iSize;
        m_oMainDict = new Dictionary<K, LRUNode<K,V>>();

        m_oHead = null;
        m_oTail = null;
    }

    public V this[K key]
    {
        get
        {
            m_oCurrent = m_oMainDict[key];

            if (m_oCurrent == m_oHead)
            {
                //do nothing
            }
            else if (m_oCurrent == m_oTail)
            {
                m_oTail = m_oCurrent.Next;
                m_oTail.Prev = null;

                m_oHead.Next = m_oCurrent;
                m_oCurrent.Prev = m_oHead;
                m_oCurrent.Next = null;
                m_oHead = m_oCurrent;
            }
            else
            {
                m_oCurrent.Prev.Next = m_oCurrent.Next;
                m_oCurrent.Next.Prev = m_oCurrent.Prev;

                m_oHead.Next = m_oCurrent;
                m_oCurrent.Prev = m_oHead;
                m_oCurrent.Next = null;
                m_oHead = m_oCurrent;
            }
            
            return m_oCurrent.Value;
        }
    }

    public void Add(K key, V value)
    {
        if (m_oMainDict.Count >= m_iMaxItems)
        {   
            //remove old
            m_oMainDict.Remove(m_oTail.Key);

            //reuse old
            LRUNode<K, V> oNewNode = m_oTail;
            oNewNode.Key = key;
            oNewNode.Value = value;
            
            m_oTail = m_oTail.Next;
            m_oTail.Prev = null;

            //add new
            m_oHead.Next = oNewNode;
            oNewNode.Prev = m_oHead;
            oNewNode.Next = null;
            m_oHead = oNewNode;
            m_oMainDict.Add(key, oNewNode);
        }
        else
        {
            LRUNode<K, V> oNewNode = new LRUNode<K, V>(key, value);
            if (m_oHead == null)
            {
                m_oHead = oNewNode;
                m_oTail = oNewNode;
            }
            else
            {
                m_oHead.Next = oNewNode;
                oNewNode.Prev = m_oHead;
                m_oHead = oNewNode;
            }
            m_oMainDict.Add(key, oNewNode);
        }
    }

    public bool Contains(K key)
    {
        return m_oMainDict.ContainsKey(key);
    }
}


internal class LRUNode<K,V>
{
    public LRUNode(K key, V val)
    {
        Key = key;
        Value = val;
    }

    public K Key;
    public V Value;
    public LRUNode<K, V> Next;
    public LRUNode<K, V> Prev;
}

There are a few parts that look/feel wonky -- like reusing the old node when doing an add -- but I was able to get an appreciable boost in porformance out of them. I was also slightly surprised at the difference it made to switch from actual properties on the node to just public variables, but I guess that's how it goes with this stuff. At this point the code above is almost entirely performance-limited by the dictionary operations, so I'm not sure I'd get a lot more out of mashing it around. I'll continue to think on it and look into some of the responses.

Explanation From Original Post:
Hello all.
So I've written a simple lightweight LRU implementation for use in a compression library (I'm using it to find matching byte-strings in the input based on a hash, LZW-style), and I'm looking for ways to make it faster.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

套路撩心 2024-07-21 09:46:35

UPDATE #2

这减少了对链表删除进行列表遍历的需要。 它引入了一个同时具有键和值的LruCacheNode。 该密钥仅在修剪缓存时使用。 如果您编写自己的链表实现,其中每个节点本质上是一个 LruCacheNode 以及 Next 和 Back 引用,那么您可以获得更好的性能。 这就是 LinkedHashMap 的作用(请参阅 这些两个问题) 。

public class LruCache<K, V>
{
    private readonly int m_iMaxItems;
    private readonly Dictionary<K, LinkedListNode<LruCacheNode<K, V>>> m_oMainDict;
    private readonly LinkedList<LruCacheNode<K, V>> m_oMainList;

    public LruCache(int iSize)
    {
        m_iMaxItems = iSize;
        m_oMainDict = new Dictionary<K, LinkedListNode<LruCacheNode<K, V>>>();
        m_oMainList = new LinkedList<LruCacheNode<K, V>>();
    }

    public V this[K key]
    {
        get
        {
            return BumpToFront(key).Value;
        }
        set
        {
            BumpToFront(key).Value = value;
        }
    }

    public void Add(K key, V value)
    {
        LinkedListNode<LruCacheNode<K, V>> newNode = m_oMainList.AddFirst(new LruCacheNode<K, V>(key, value));
        m_oMainDict.Add(key, newNode);

        if (m_oMainList.Count > m_iMaxItems)
        {
            m_oMainDict.Remove(m_oMainList.Last.Value.Key);
            m_oMainList.RemoveLast();
        }
    }

    private LruCacheNode<K, V> BumpToFront(K key)
    {
        LinkedListNode<LruCacheNode<K, V>> node = m_oMainDict[key];
        if (m_oMainList.First != node)
        {
            m_oMainList.Remove(node);
            m_oMainList.AddFirst(node);
        }
        return node.Value;
    }

    public bool Contains(K key)
    {
        return m_oMainDict.ContainsKey(key);
    }
}

internal sealed class LruCacheNode<K, V>
{
    private readonly K m_Key;
    private V m_Value;

    public LruCacheNode(K key, V value)
    {
        m_Key = key;
        m_Value = value;
    }

    public K Key
    {
        get { return m_Key; }
    }

    public V Value
    {
        get { return m_Value; }
        set { m_Value = value; }
    }
}

您必须分析一些事情,看看这是否对您的环境有所改善。

次要更新:我更新了 BumpToFront 以检查该节点是否已位于 Tim Stewart 评论的前面。

UPDATE #2

This reduces the need for list traversal on a linked list Remove. It introduces a LruCacheNode that has both the key and the value. The key is only used when you trim the cache. You could get better performance if you wrote your own linked list implementation where each node essentially is an LruCacheNode along with a Next and Back reference. This is sort of what a LinkedHashMap does (see these two questions).

public class LruCache<K, V>
{
    private readonly int m_iMaxItems;
    private readonly Dictionary<K, LinkedListNode<LruCacheNode<K, V>>> m_oMainDict;
    private readonly LinkedList<LruCacheNode<K, V>> m_oMainList;

    public LruCache(int iSize)
    {
        m_iMaxItems = iSize;
        m_oMainDict = new Dictionary<K, LinkedListNode<LruCacheNode<K, V>>>();
        m_oMainList = new LinkedList<LruCacheNode<K, V>>();
    }

    public V this[K key]
    {
        get
        {
            return BumpToFront(key).Value;
        }
        set
        {
            BumpToFront(key).Value = value;
        }
    }

    public void Add(K key, V value)
    {
        LinkedListNode<LruCacheNode<K, V>> newNode = m_oMainList.AddFirst(new LruCacheNode<K, V>(key, value));
        m_oMainDict.Add(key, newNode);

        if (m_oMainList.Count > m_iMaxItems)
        {
            m_oMainDict.Remove(m_oMainList.Last.Value.Key);
            m_oMainList.RemoveLast();
        }
    }

    private LruCacheNode<K, V> BumpToFront(K key)
    {
        LinkedListNode<LruCacheNode<K, V>> node = m_oMainDict[key];
        if (m_oMainList.First != node)
        {
            m_oMainList.Remove(node);
            m_oMainList.AddFirst(node);
        }
        return node.Value;
    }

    public bool Contains(K key)
    {
        return m_oMainDict.ContainsKey(key);
    }
}

internal sealed class LruCacheNode<K, V>
{
    private readonly K m_Key;
    private V m_Value;

    public LruCacheNode(K key, V value)
    {
        m_Key = key;
        m_Value = value;
    }

    public K Key
    {
        get { return m_Key; }
    }

    public V Value
    {
        get { return m_Value; }
        set { m_Value = value; }
    }
}

You'll have to profile things to see if this is an improvement in your environment.

Minor Update: I updated BumpToFront to check to see if the node is already at the front per comment from Tim Stewart.

你列表最软的妹 2024-07-21 09:46:35

LRU 缓存的目的不就是允许您修剪缓存并丢弃最近最少使用的内容吗? :-) 我没有看到任何修剪缓存的代码。 既然您很可能希望检索用例具有高性能,而修剪用例不太重要,为什么不将列表维护转移到修剪过程呢?

IOW,只需将条目放入缓存中,但在检索时为其添加时间戳。 不要对条目重新排序,只需在使用时对其进行标记即可。 可以是真正的日期时间时间戳,也可以是类中的简单计数器,最近使用的最大数字。 然后在修剪过程中,只需遍历整棵树并删除带有最旧邮票的条目即可。

Isn't the point of a LRU cache to allow you to trim the cache and throw out the least-recently-used stuff? :-) I don't see any code to trim the cache. Since you most likely want high performance for the retrieve use-case, and the trim use-case is less important why not offload the list maintenance to the trim process?

IOW, just throw the entries into the cache, but timestamp them on retrieval. Don't reorder the entries, just mark them whenever they're used. Could be a true DateTime timestamp, or could be a simple counter in the class, highest number was most recently used. Then in the trim process just walk the whole tree and remove the entries with the oldest stamps.

三生路 2024-07-21 09:46:35

使用硬件缓存,您可以将其设置为 32 x 4,即 32 行,每行 4 个元素,而不是使用 128 个元素并维护项目 1-128 的顺序。 地址的前 5 位将确定该地址将映射到 32 行中的哪一行,然后您将仅搜索 4 项,如果找不到,则替换 4 项中最旧的一项。

这要快得多,并且 IIRC 在 10 以内1 x 128 缓存命中率的百分比。

为了进行翻译,您将拥有多个链表,而不是一个链表,因此遍历它们要快得多。 您必须有一种方法来确定特定项目映射到哪个列表。

关键是,随着列表大小的增长,尝试以完美的精度维护列表中每个元素的确切顺序所带来的回报会递减。 您甚至可能更好地使用无序列表,并在缓存未命中时随机替换任何元素。 取决于列表的大小,以及错过的惩罚与维护列表的成本。

With hardware caches, instead of having say 128 elements, and maintaining the order of items 1-128, you might have it as 32 x 4, so 32 rows of 4 elements each. The first 5 bits of an address would determine which of the 32 rows that address would map to, then you would search only the 4 items, and if not found replace the oldest of the 4.

This is much faster, and is IIRC within 10% of the hit rate of a 1 x 128 cache.

To translate, you would instead of one linked list, have multiple ones, so traversing them was much faster. You would have to have a way of determining which list a particular item mapped to.

The point being, as your list grows in size, you get diminishing returns from trying to maintain with perfect accuracy the exact order of each element in the list. You might even be better off with an unordered list, and randomly replacing any element when you have a cache miss. Depends on the size of your list, and the penalty for a miss vs the cost of maintaining the list.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文