关于缓存的问题
我一直想知道如何控制缓存到内存中的内容。
我一直认为至少用c++是不可能的。
直到有一天,有人告诉我不要在 C++ 应用程序中包含 lua 脚本,因为它“……因为完全破坏你的缓存而臭名昭著……”。
这让我开始思考,C++ 或任何其他编译语言是否可以控制程序在内存中缓存的内容。因为如果 lua 可以修改我的缓存性能,那么为什么我不能。
如果是这样,
我。它是依赖于架构还是依赖于操作系统?
二.您可以访问缓存中的内容或已缓存的内容吗?
澄清一下,我说的是 CPU 缓存。
I have always wondered how i can control what is cached into the memory.
I always thought it was not possible to do with c++ atleast.
Until one day when a person told me not to include lua scripts in c++ application because it "...is notorious for completely ruining your cache...".
That got me thinking , is there anyway in c++ or any other compiled language to control what your program caches in memory.Because if lua can modify my cache performance, then why can't I.
If so,
i. Is it Architecture dependent or OS dependent ?
ii. Can you access what is in the cache or what is cached?.
Just to be Clear i am talking about CPU cache.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
CPU 将缓存它需要的所有数据,并且由于它的大小是有限的,因此当它必须加载新内容时,它会丢弃最近最少使用的任何数据。
基本上您无法直接控制它,但您可以间接控制它:
您必须知道 CPU 使用缓存线。每个高速缓存行都是一个小内存块。
因此,如果 CPU 需要一些数据,它将获取整个块。因此,如果您有一些非常频繁使用的数据,并且通常会分散在内存中,您可以将其放入一个结构体中,这样 CPU 缓存的有效使用率会更好(您缓存的东西会更少,而不是真正的)需要)。注意:99.99% 的情况下您不需要此类优化。
一个更有用的示例是遍历不适合缓存的二维数组。如果你线性地走它,你将加载每个缓存线一次,处理它,然后在某个时刻CPU将丢弃它。如果您以错误的方式使用索引,每个缓存行将被加载多次,并且由于主内存访问速度很慢,您的代码将会慢很多。如果你线性行走(方向无关紧要),CPU 也可以更好地预取。
调用一些需要大量数据和/或代码的外部库也会破坏缓存性能,因此主程序+数据将从缓存中删除,并且当调用完成时,CPU 必须再次加载它。
如果您进行大量优化并想知道如何利用 L1/L2/.. 缓存,您可以进行模拟。 Valgrind 有一个名为 Cachegrind 的优秀模块,它正是这样做的。
The CPU will cache all the data it needs and because its size is limited when it has to load something new it will drop anything that was the least recently used.
Basically you don't have direct control over it, but indirectly you have some:
What you have to know is that CPUs use cache lines. Each cache line is a small block of memory.
So if the CPU needs some data it will fetch the whole block. So, if you have some data that is very frequently used and would normally be scattered in the memory you can put it for example inside a struct so the the effective usage of the CPU cache will better (you cache less things that aren't really needed). Note: 99.99% of the time you don't need these kind of optimizations.
A more useful example is walking through a 2d array that doesn't fit into cache. If you walk it linearly you will load each cache line once, process it and some point later the CPU will drop it. If you use the indexes the wrong way each cache line will be loaded multiple times and because main memory access is slow, your code will be a lot slower. CPU can also do better prefetching if you walk linearly (direction doesn't matter).
Cache peformance can also be ruined by calling some external library which needs a lot of data and/or code so you main program+data will be dropped from the caches and when the call finishes the CPU has to load it again.
If you do heavy optimizations and want to know how you utilize the L1/L2/.. cache you can do simulations. Valgrind has an excellent module called Cachegrind which does exactly that.
CPU 缓存 通常用于多个独立的缓存。在大多数现代 CPU 上,通常有三个高速缓存:
正如 yi_H 所说:您无法直接控制它,但您可以间接控制。
因此,缓存性能不佳的原因有多种。
常见的有:
这通常会导致 抖动,其中 CPU 主要处于理想状态等待数据处理。
无论您的应用程序是用什么操作系统/语言编写的,都需要将应用程序中每个关键性能区域的指令和数据工作集减少到尽可能小。
如果您想影响 CPU 缓存性能,则 问题:
是的
不
The CPU cache is normally used for multiple independent caches. On most modern CPU's there are normally three caches:
As yi_H says: you don't have direct control over it, but you you do have indirect control.
So there are multiple reasons to have poor cache performance.
Common ones are:
This normally results in thrashing where the CPU is mainly sitting ideal waiting for data to process.
If you want to influence your CPU cache performance you need to reduce your instruction and data working sets to be as small as possible for each of you critical performance areas in your application not matter what OS/language your application is written in.
As to your questions:
Yes
No
在大多数平台上,不,您无法直接控制缓存的内容。一般来说,每当您从某个内存地址读取时,该内存的内容都会被复制到缓存中,除非您需要的内容已经在缓存中。
当他们谈论“破坏你的缓存”时,他们真正的意思是“破坏你的性能”。读取片外存储器速度慢(高延迟);读取缓存速度快(低延迟)。如果您以愚蠢的模式访问内存,您将不断覆盖缓存的内容(即“缓存未命中”),而不是重新使用缓存中已有的内容(即“缓存命中”)并最大限度地减少从片外内存的读取。
On most platforms, no, you cannot directly control what gets cached. In general, whenever you read from some memory address, the content of that memory will get copied into the cache, unless the content you need is already in cache.
When they talk about "ruining your cache", what they really mean is "ruining your performance". Reading off-chip memory is slow (high latency); reading cache is fast (low latency). If you access memory in a stupid pattern, you will be constantly overwriting the contents of cache (i.e. "cache misses"), rather than re-using what's already in cache (i.e. "cache hits") and minimising reads from off-chip memory.