内存访问的成本是多少?
我们喜欢认为内存访问是快速且恒定的,但在现代架构/操作系统上,这不一定是真的。
考虑以下 C 代码:
int i = 34;
int *p = &i;
// do something that may or may not involve i and p
{...}
// 3 days later:
*p = 643;
,CPU 指令中最后一次分配的估计成本是多少
- 如果
i
在 L1 缓存中, i
在 L2 缓存中,i
位于 L3 缓存中,i
位于 RAM 中,i
被分页到 SSD 磁盘,i
被分页到传统磁盘?
i
还能在哪里?
当然,这些数字不是绝对的,但我只对数量级感兴趣。我尝试在网上搜索,但谷歌这次没有保佑我。
We like to think that a memory access is fast and constant, but on modern architectures/OSes, that's not necessarily true.
Consider the following C code:
int i = 34;
int *p = &i;
// do something that may or may not involve i and p
{...}
// 3 days later:
*p = 643;
What is the estimated cost of this last assignment in CPU instructions, if
i
is in L1 cache,i
is in L2 cache,i
is in L3 cache,i
is in RAM proper,i
is paged out to an SSD disk,i
is paged out to a traditional disk?
Where else can i
be?
Of course the numbers are not absolute, but I'm only interested in orders of magnitude. I tried searching the webs, but Google did not bless me this time.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
以下是一些硬性数字,表明确切的时序因 CPU 系列和版本而异:http://www.agner。 org/optimize/
这些数字是一个很好的指导:
作为一个信息图,可以为您提供数量级:
(src http://news.ycombinator.com/item?id=702713)
Here's some hard numbers, demonstrating that exact timings vary from CPU family and version to version: http://www.agner.org/optimize/
These numbers are a good guide:
And as an infograph to give you the orders of magnitude:
(src http://news.ycombinator.com/item?id=702713)
自 2001 年以来,Norvig 有一些价值观。从那时起,情况发生了一些变化,但我认为相对速度仍然大致正确。
Norvig has some values from 2001. Things have changed some since then but I think the relative speeds are still roughly correct.
它也可能位于 CPU 寄存器中。 C/C++ 关键字“寄存器”告诉 CPU 将变量保存在寄存器中,但您不能保证它会保留甚至永远进入其中。
It could also be in a CPU-register. The C/C++-keyword "register" tells the CPU to keep the variable in a register, but you can't guarantee it will stay or even ever get in there.
只要高速缓存/RAM/硬盘/SSD 不忙于服务其他访问(例如 DMA 请求)并且硬件相当可靠,那么成本仍然是恒定的(尽管它们可能是一个很大的常数)。
当你遇到缓存未命中的情况时,你必须分页到硬盘来读取变量,那么这只是一个简单的硬盘读取请求,这个成本是巨大的,因为CPU必须:向内核发送硬盘读取请求的中断,发送向硬盘发出请求,等待硬盘将数据写入RAM,然后将数据从RAM读取到缓存和寄存器。然而,这个成本仍然是不变成本。
实际的数字和比例将根据您的硬件以及硬件的兼容性而有所不同(例如,如果您的 CPU 以 2000 Mhz 运行,而 RAM 以 333 Mhz 发送数据,那么它们不能很好地同步)。解决这个问题的唯一方法是在程序中测试它。
这不是过早的优化,这是微优化。让编译器担心这些细节。
As long as the Cache/RAM/Harddisk/SSD is not busy serving other access (e.g. DMA requests) and that the hardware is reasonably reliable, then the cost is still constant (though they may be a large constant).
When you get a cache miss, and you have to page to harddisk to read the variable, then it's just a simple harddisk read request, this cost is huge, as the CPU has to: send interrupt to the kernel for harddisk read request, send a request to harddisk, wait for the harddisk to write the data to RAM, then read the data from RAM to cache and to a register. However, this cost is still constant cost.
The actual numbers and proportions will vary depending on your hardware, and on the compatibility of your hardware (e.g. if your CPU is running on 2000 Mhz and your RAM sends data at 333 Mhz then they doesn't sync very well). The only way you can figure this out is to test it in your program.
And this is not premature optimization, this is micro-optimization. Let the compiler worry about these kind of details.
这些数字一直在变化。但对于 2010 年的粗略估计,Kathryn McKinley 在网络上有很好的幻灯片,我觉得没有必要在这里复制。
您想要的搜索词是“内存层次结构”或“内存层次结构成本”。
These numbers change all the time. But for rough estimates for 2010, Kathryn McKinley has nice slides on the web, which I don't feel compelled to copy here.
The search term you want is "memory hierarchy" or "memory hierarchy cost".
i
和*i
是不同的东西,它们都可以位于列表中的任何位置。另外,在进行分配时,指针地址可能仍存储在 CPU 寄存器中,因此不需要从 RAM/Cache/… 获取它。关于性能:这高度依赖于 CPU。从数量级考虑,访问 RAM 比访问缓存条目更糟糕,而访问换出的页面是最糟糕的。所有这些都有点不可预测,因为它们还取决于其他因素(即其他处理器,取决于系统架构)。
i
and*i
are different things, both of them can be located in any of the locations in your list. The pointer address might additionally still be stored in a CPU register when the assignment is made, so it doesn't need to be fetched from RAM/Cache/…Regarding performance: this is highly CPU-dependent. Thinking in orders of magnitude, accessing RAM is worse than accessing cache entries and accessing swapped-out pages is the worst. All are a bit unpredictable because they depend on other factors as well (i.e. other processors, depending on the system architecture).