忽略缓存的数据结构和动态语言 - 有效吗?
我最近一直在阅读有关缓存无关的数据结构(例如辅助缓冲区堆)的内容。这些数据结构的工作原理是将最近访问的元素保留在高速缓存中,因此任何后续访问也会更快。
大多数这些数据结构都是用 C/C++ 等低级语言实现的。尝试将这些数据结构移植到 Python 这样的动态语言是否值得,或者在虚拟机上运行的开销是否会破坏这些数据结构的所有性能优势?似乎是后者,但我想我应该问问是否有人确实有这方面的经验。
I've been reading recently about cache-oblivious data structures like auxiliary buffer heaps. These data structures work by keeping their most-recently-accessed elements in cache memory so any subsequent access is also quicker.
Most of these data structures are implemented with a low-level language like C/C++. Is it worth it to try to port these data structures over to a dynamic language like Python, or does the overhead of running on a virtual machine ruin all the performance benefits of these data structures? It seems like the latter, but I thought I would ask to see if someone actually has some experience with it.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
在 C(或 C++)中,您可以对每个数据结构的确切大小进行细粒度控制。您还可以对存储分配进行细粒度控制。毕竟,您可以扩展 new 方法,直接使用 malloc 或结构内存来创建空间局部性。
在大多数动态语言(如 Python)中,您根本无法控制任何内容的确切大小,更不用说它的位置了。
在Python中,你可能有一些时间局部性,但你几乎没有或没有空间局部性。
时间局部性可以通过简单的结果记忆来增强。这是一种常见的加速,人们经常使用记忆化装饰器来将记忆化(时间局部性)与核心算法分开。
我不认为 C 或 C++ 忽略缓存的实现可以转换为动态语言,因为我认为您没有足够的控制权。相反,只需利用记忆来加速。
http://wiki.python.org/moin/PythonDecoratorLibrary#Memoize
In C (or C++) you have fine-grained control over the exact size of each data structure. You also have the possibility of fine-grained control over storage allocation. You can, after all, extend the
new
method, usemalloc
directly and otherwise structure memory to create spatial locality.In most dynamic languages (like Python) you have no control at all over the exact size of anything, much less it's location.
In Python you may have some temporal locality, but you have little or no spatial locality.
Temporal locality might be enhanced through simple memoization of results. This is such a common speed-up that people often include a memoization decorator to uncouple the memoization (temporal locality) from the core algorithm.
I don't think that C or C++ cache-oblivious implementations translate to dynamic languages because I don't think you have enough control. Instead, just exploit memoization for speed-ups.
http://wiki.python.org/moin/PythonDecoratorLibrary#Memoize
缓存不经意算法的要点之一是对象的大小并不重要(因为无论如何你都不知道下一级内存分页的大小),因此无法知道确切的大小并不重要。在某些时候,超过 1 个对象的大小“适合”下一个内存块大小。 (注意:不知道对象的大小对于缓存感知实现来说是一个重大问题)。
此外,大多数虚拟机内存分配器将在生成空间的末尾进行分配,因此只要您同时分配所有对象,就可以正常开始。不幸的是,一些缓存无关算法假设您可以更改内存布局,而这对于虚拟机来说通常是不可能的。
另一个大问题是,大多数基于 VM 的实现都使用对象的引用。因此,一个包含三个字符串的对象实际上是 4 个对象(对象本身和 3 个字符串对象)。除非这四个对象彼此相邻分配,否则算法的分析就会失败。
再加上虚拟机可以自由执行的压缩垃圾收集器和其他“优化”,您所需的控制与这些算法的当前研究状态之间存在巨大差距。
One of the major points of cache oblivious algorithms is that the size of the object does not really matter (because you don't know the size of the next level of memory paging anyway), so the inability to know the exact size is not important. At some point, the size of more than 1 object "fits" into the next memory block size. (Note: not knowing the size of the object is a significant problem for cache aware implementations).
Additionally, most VM memory allocators will allocate at the end of a generation space so as long as you allocate all of your objects at the same time, you could start out ok. Unfortunately, some cache oblivious algorithms assume that you can change the memory layout which is usually impossible with a VM.
Another big problem, is that most VM based implementations use references for objects. So an object with three strings in it is actually 4 objects (the object itself, and 3 string objects). Unless those four objects are allocated next to each other, the analysis of the algorithm breaks down.
Add in compacting garbage collectors and other "optimizations" that VMs are free to do and you have a significant gap between the control you need and the current state of research on these algorithms.