对大型数据集进行高效重新排序,以最大限度地提高内存缓存效率

发布于 2024-07-12 22:44:57 字数 897 浏览 5 评论 0原文

我一直在研究一个我认为人们可能会感兴趣的问题(也许有人知道一个预先存在的解决方案)。

我有一个大型数据集,由一长串指向对象的指针对组成,如下所示:

[
  (a8576, b3295), 
  (a7856, b2365), 
  (a3566, b5464),
  ...
]

任何时候都有太多对象无法保存在内存中(可能有数百 GB),因此它们需要存储在磁盘上,但可以缓存在内存中(可能使用 LRU 缓存)。

我需要运行这个列表来处理每一对,这要求将这对中的两个对象加载到内存中(如果它们尚未缓存在那里)。

那么,问题是:是否有一种方法可以对列表中的对进行重新排序,以最大限度地提高内存中缓存的有效性(换句话说:最大限度地减少缓存未命中的次数)?

注释

  1. 显然,重新排序算法应该尽可能快,并且不应该依赖于能够一次将整个列表存储在内存中(因为我们没有足够的内存) RAM) - 但如果有必要,它可以多次迭代列表。

  2. 如果我们处理的是单个对象,而不是成对的对象,那么简单的答案就是对它们进行排序。 这显然在这种情况下不起作用,因为您需要考虑对中的两个元素。

  3. 问题可能与寻找最小图割有关,但即使问题是等价的,我也不认为最小割的解决方案满足

  4. 我的假设是启发式会流将数据从磁盘上删除,并以更好的顺序将其分块写回。 它可能需要对此进行多次迭代。

  5. 实际上可能不只是一对,也可能是三胞胎、四胞胎,甚至更多。 我希望对对执行此操作的算法可以轻松推广。

I've been working on a problem which I thought people might find interesting (and perhaps someone is aware of a pre-existing solution).

I have a large dataset consisting of a long list of pairs of pointers to objects, something like this:

[
  (a8576, b3295), 
  (a7856, b2365), 
  (a3566, b5464),
  ...
]

There are way too many objects to keep in memory at any one time (potentially hundreds of gigabytes), so they need to be stored on disk, but can be cached in memory (probably using an LRU cache).

I need to run through this list processing every pair, which requires that both objects in the pair be loaded into memory (if they aren't already cached there).

So, the question: is there a way to reorder the pairs in the list to maximize the effectiveness of an in-memory cache (in other words: minimize the number of cache misses)?

Notes

  1. Obviously, the re-ordering algorithm should be as fast as possible, and shouldn't depend on being able to have the entire list in memory at once (since we don't have enough RAM for that) - but it could iterate over the list several times if necessary.

  2. If we were dealing with individual objects, not pairs, then the simple answer would be to sort them. This obviously won't work in this situation because you need to consider both elements in the pair.

  3. The problem may be related to that of finding a minimum graph cut, but even if the problems are equivalent, I don't think solutions to min-cut meet

  4. My assumption is that the heuristic would stream the data off the disk, and write it back in chunks in a better order. It may need to iterate over this several times.

  5. Actually it may not just be pairs, it could be triplets, quadruplets, or more. I'm hoping that an algorithm that does this for pairs can be easily generalized.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

泼猴你往哪里跑 2024-07-19 22:44:57

您的问题与计算机图形硬件的类似问题有关:

在三角形网格中渲染索引顶点时,通常硬件会缓存最近转换的顶点(上次我不得不担心它时〜128,但怀疑这个数字这些天更大)。 未缓存的顶点需要相对昂贵的变换操作来计算。 重组三角形网格以优化缓存使用的“网格优化”曾经是一个非常热门的研究主题。 谷歌搜索
顶点缓存优化
(或优化:^)可能会为您找到一些与您的问题相关的有趣材料。 正如其他海报所暗示的那样,我怀疑有效地做到这一点将取决于利用数据中的任何固有一致性。

另一件需要记住的事情是:当 LRU 缓存变得过载时,非常值得更改为 MRU 替换策略,以至少在内存中保留一些项目(而不是每次传递整个缓存)。 我似乎记得 John Carmack 在这个主题上写了一些与 Direct3D 纹理缓存策略相关的好材料。

Your problem is related to a similar one for computer graphics hardware:

When rendering indexed vertices in a triangle mesh, typically the hardware has a cache of most recently transformed vertices (~128 the last time I had to worry about it, but suspect the number is larger these days). Vertices not cached need a relatively expensive transform operation to calculate. "Mesh optimisation" to restructure triangle meshes to optimise cache usage used to be a pretty hot research topic. Googling
vertex cache optimisation
(or optimization :^) might find you some interesting material relevant to your problem. As other posters suggest, I suspect doing this effectively will depend on exploiting any inherent coherence in your data.

Another thing to bear in mind: as an LRU cache becomes overloaded it can be well worth changing to an MRU replacement strategy to at least hold some of the items in memory (rather than turning over the entire cache each pass). I seem to remember John Carmack has written some good material on this subject in connection with Direct3D texture caching strategies.

落在眉间の轻吻 2024-07-19 22:44:57

首先,您可以 mmap 列表。 如果有足够的地址空间而不是内存(例如在 64 位 CPU 上),则该方法有效。 这使得按顺序访问元素变得更加容易。

您可以根据缓存中考虑两个元素的最小距离对该列表进行排序,如果对象位于连续空间中,则效果很好。 排序函数可能类似于:比较 (a, b) 与 (c, d) = (a - c) + (b - d) (看起来像汉明距离)。 然后,您提取对象存储的切片并根据列表进行处理。

编辑:修正了距离上的错误。

For start, you could mmap the list. That works if there's enough address space, not memory, e.g. on 64-bit CPUs. This makes it easier to access the elements in order.

You could sort that list according to a minimum distance in cache which considers both elements, which works well if the objects are in a contiguous space. The sorting function could be something like: compare (a, b) to (c, d) = (a - c) + (b - d) (which looks like a Hamming distance). Then you pull in slices of the object store and process according to the list.

EDIT: fixed a mistake in the distance.

霓裳挽歌倾城醉 2024-07-19 22:44:57

即使您不只是对此列表进行排序,多路合并排序可能适用 - 也就是说,考虑将集合某种(可能是递归的)分解为可以在内存中单独处理的较小集合,然后是第二阶段,其中小块前面处理过的集合都可以组合在一起。 即使不知道您对这些对所做的具体性质,可以肯定地说,当您处理排序数据时,许多算法问题都会变得更加简单(包括图形问题,这可能是您在处理排序数据时遇到的问题)手在这里)。

Even though you're not just sorting this list, the general pattern of a multiway merge sort might be applicable - that is, consider some kind of (possibly recursive) breakdown of the set into smaller sets that can be dealt with in memory separately, and then a second phase where small chunks of the previously dealt-with sets can all be combined together. Even not knowing the specific nature of what you're doing with the pairs, it's safe to say that many algorithmic problems are made much more straightforward when you're dealing with sorted data (including graph problems, which might be what you have on your hands here).

只是在用心讲痛 2024-07-19 22:44:57

我认为这个问题的答案将在很大程度上取决于这对对象的访问模式。 正如您所说,在简单的非配对情况下,仅对指针进行排序是最好的。 在更复杂的情况下,如果模式使得这些值的局部性更重要,那么按该对的一半进行排序可能仍然有意义(例如,如果这些是键/值对,并且您正在执行大量搜索,键的局部性比值的局部性更重要)。

所以,实际上,我的答案是,这个问题在一般情况下无法得到回答。

为了存储结构,您真正想要的可能是 B 树。 这些是为您所讨论的内容而设计的 - 跟踪您不想(或不能)将整个内容保留在内存中的大型集合。

I think the answer to this question is going to depend very heavily on exactly the access pattern of the pair of objects. As you said, just sorting the pointers would be best in a simple, non-paired case. In a more complex case it may still make sense to sort by one of the halves of the pair if the pattern is such that locality for those values is more important (if, for example, these are key/value pairs and you are doing a lot of searches, locality for the keys is infinitely more important than for the values).

So, really, my answer is that this question can't be answered in a general case.

For storing your structure, what you actually want is probably a B-tree. These are designed for what you're talking about--keeping track of large collections where you don't want to (or can't) keep the whole thing in memory.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文