使用 C++ 最近最少使用的缓存

发布于 2024-09-17 07:05:08 字数 174 浏览 11 评论 0原文

我正在尝试使用 C++ 实现 LRU 缓存。我想知道实现它们的最佳设计是什么。我知道 LRU 应该提供 find()、添加元素和删除元素。删除操作应删除 LRU 元素。实现此目的的最佳 ADT 是什么 例如:如果我使用以元素为值、以时间计数器为键的映射,我可以在 O(logn) 时间内搜索,插入为 O(n),删除为 O(logn)。

I am trying to implement LRU Cache using C++ . I would like to know what is the best design for implementing them. I know LRU should provide find(), add an element and remove an element. The remove should remove the LRU element. what is the best ADTs to implement this
For ex: If I use a map with element as value and time counter as key I can search in O(logn) time, Inserting is O(n), deleting is O(logn).

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(7

夏天碎花小短裙 2024-09-24 07:05:08

LRU 缓存的一个主要问题是几乎没有“const”操作,大多数操作都会更改底层表示(如果只是因为它们会碰撞所访问的元素)。

这当然非常不方便,因为这意味着它不是传统的 STL 容器,因此任何展示迭代器的想法都是相当复杂的:当迭代器被取消引用时,这是一个访问,它应该修改我们正在迭代的列表......天哪。

还有性能方面的考虑,包括速度和内存消耗。

不幸的是,您需要某种方法来组织队列 (LRU) 中的数据(可以从中间删除元素),这意味着您的元素必须彼此独立。当然,std::list 很适合,但它超出了您的需要。在这里,单链表就足够了,因为您不需要向后迭代列表(毕竟您只需要一个队列)。

然而,它们的一个主要缺点是它们的引用位置较差,如果您需要更快的速度,您需要为节点提供自己的自定义(池?)分配器,以便它们尽可能靠近。这也会在一定程度上减轻堆碎片。

接下来,您显然需要一个索引结构(用于缓存位)。最自然的就是转向哈希图。 std::tr1::unordered_mapstd::unordered_mapboost::unordered_map 通常是高质量的实现,有些应该可供您使用。它们还为哈希冲突处理分配额外的节点,您可能更喜欢其他类型的哈希映射,请查看维基百科的文章 关于该主题并了解各种实现技术的特征。

继续,有(明显的)线程支持。如果你不需要线程支持,那么没关系,但是如果你需要,那就有点复杂了:

  • 正如我所说,在这样的结构上几乎没有 const 操作,因此你不需要确实需要区分读/写访问
  • 内部锁定很好,但您可能会发现它不适合您的使用。内部锁定的问题在于它不支持“事务”的概念,因为它放弃了每次调用之间的锁定。如果是这种情况,请将对象转换为互斥体并提供 std::unique_ptr; lock() 方法(在调试中,您可以断言在每个方法的入口点获取锁)
  • (在锁定策略中)存在重入问题,即能够从在同一线程内,检查 Boost.Thread 以获取有关各种可用锁和互斥体的更多信息。

最后,还有错误报告问题。由于预计缓存可能无法检索您放入的数据,因此我会考虑使用“不良品味”异常。考虑指针 (Value*) 或 Boost.Optionalboost::可选)。我更喜欢 Boost.Optional 因为它的语义很清晰。

One major issue with LRU caches is that there is little "const" operations, most will change the underlying representation (if only because they bump the element accessed).

This is of course very inconvenient, because it means it's not a traditional STL container, and therefore any idea of exhibiting iterators is quite complicated: when the iterator is dereferenced this is an access, which should modify the list we are iterating on... oh my.

And there are the performances consideration, both in term of speed and memory consumption.

It is unfortunate, but you'll need some way to organize your data in a queue (LRU) (with the possibility to remove elements from the middle) and this means your elements will have to be independant from one another. A std::list fits, of course, but it's more than you need. A singly-linked list is sufficient here, since you don't need to iterate the list backward (you just want a queue, after all).

However one major drawback of those is their poor locality of reference, if you need more speed you'll need to provide your own custom (pool ?) allocator for the nodes, so that they are kept as close together as possible. This will also alleviate heap fragmentation somewhat.

Next, you obviously need an index structure (for the cache bit). The most natural is to turn toward a hash map. std::tr1::unordered_map, std::unordered_map or boost::unordered_map are normally good quality implementation, some should be available to you. They also allocate extra nodes for hash collision handling, you might prefer other kinds of hash maps, check out Wikipedia's article on the subject and read about the characteristics of the various implementation technics.

Continuing, there is the (obvious) threading support. If you don't need thread support, then it's fine, if you do however, it's a bit more complicated:

  • As I said, there is little const operation on such a structure, thus you don't really need to differentiate Read/Write accesses
  • Internal locking is fine, but you might find that it doesn't play nice with your uses. The issue with internal locking is that it doesn't support the concept of "transaction" since it relinquish the lock between each call. If this is your case, transform your object into a mutex and provide a std::unique_ptr<Lock> lock() method (in debug, you can assert than the lock is taken at the entry point of each method)
  • There is (in locking strategies) the issue of reentrance, ie the ability to "relock" the mutex from within the same thread, check Boost.Thread for more information about the various locks and mutexes available

Finally, there is the issue of error reporting. Since it is expected that a cache may not be able to retrieve the data you put in, I would consider using an exception "poor taste". Consider either pointers (Value*) or Boost.Optional (boost::optional<Value&>). I would prefer Boost.Optional because its semantic is clear.

无声静候 2024-09-24 07:05:08

实现 LRU 的最佳方法是使用 std::list 和 stdext::hash_map 的组合(希望仅使用 std,然后使用 std::map)。

  • 将数据存储在列表中,以便
    最近最少使用的
    最后并使用地图指向
    列出项目。
  • 对于“获取”,请使用地图获取
    列出地址并检索数据
    并将当前节点移动到
    首先(因为现在已经使用了)并更新地图。
  • 对于“插入”,删除最后一个元素
    从列表中添加新数据
    到前面并更新地图。

这是您可以获得的最快速度,如果您使用 hash_map,您几乎应该在 O(1) 内完成所有操作。如果使用 std::map 在所有情况下都应该花费 O(logn) 。

此处提供了一个非常好的实现

The best way to implement an LRU is to use the combination of a std::list and stdext::hash_map (want to use only std then std::map).

  • Store the data in the list so that
    the least recently used in at the
    last and use the map to point to the
    list items.
  • For "get" use the map to get the
    list addr and retrieve the data
    and move the current node to the
    first(since this was used now) and update the map.
  • For "insert" remove the last element
    from the list and add the new data
    to the front and update the map.

This is the fastest you can get, If you are using a hash_map you should almost have all the operations done in O(1). If using std::map it should take O(logn) in all cases.

A very good implementation is available here

吻泪 2024-09-24 07:05:08

这篇文章描述了几种 C++ LRU 缓存实现(一种使用 STL,一种使用 boost::bimap)。

This article describes a couple of C++ LRU cache implementations (one using STL, one using boost::bimap).

凌乱心跳 2024-09-24 07:05:08

当你说优先级时,我认为“”自然会导致增加键删除分钟

When you say priority, I think "heap" which naturally leads to increase-key and delete-min.

多像笑话 2024-09-24 07:05:08

如果可以避免的话,我根本不会让缓存对外界可见。我只是有一个集合(无论什么)并以不可见的方式处理缓存,根据需要添加和删除项目,但外部接口将与底层集合完全相同。

就实现而言,堆可能是最明显的。它的复杂性与地图大致相似,但它不是从链接的节点构建树,而是在数组中排列项目,并且“链接”是基于数组索引隐式的。这会增加缓存的存储密度并提高“真实”(物理)处理器缓存中的局部性。

I would not make the cache visible to the outside world at all if I could avoid it. I'd just have a collection (of whatever) and handle the caching invisibly, adding and removing items as needed, but the external interface would be exactly that of the underlying collection.

As far as the implementation goes, a heap is probably the most obvious. It has complexities roughly similar to a map, but instead of building a tree from linked nodes, it arranges items in an array and the "links" are implicit based on array indices. This increases the storage density of your cache and improves locality in the "real" (physical) processor cache.

萌面超妹 2024-09-24 07:05:08

我建议一个 ,也许还有一个 斐波那契堆

I suggest a heap and maybe a Fibonacci Heap

淡淡绿茶香 2024-09-24 07:05:08

我会使用 C++ 中的普通堆。

有了 #include 中的 std::make_heap (标准保证为 O(n))、std::pop_heap 和 std::push_heap,实现它绝对是小菜一碟。你只需要担心增加键。

I'd go with a normal heap in C++.

With the std::make_heap (guaranteed by the standard to be O(n)), std::pop_heap, and std::push_heap in #include, implementing it would be absolutely cake. You only have to worry about increase-key.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文