LUT 等的 L1/L2 缓存行为是什么?

发布于 2024-10-05 16:49:46 字数 226 浏览 1 评论 0原文

假设 64 位双精度类型的 LUT 为 512KB。一般来说,CPU是如何缓存L1或L2的结构的?

例如:我访问中间元素,它是否尝试缓存整个 LUT 还是其中的一部分 - 比如中间元素,然后是 n 个后续元素?

CPU 使用什么样的算法来确定 L2 缓存中保存的内容?它是否遵循某种特定的前瞻策略?

注意:我假设是 x86,但我有兴趣了解其他架构如何工作,如 POWER、SPARC 等。

Assuming a LUT of say 512KB of 64-bit double types. Generally speaking, how does the CPU cache the structure in L1 or L2?

For example: I access the middle element, does it attempt to cache the whole LUT or just some of it - say the middle element and then n subsequent elements?

What kind of algorithms does the CPU use to determine what it keeps in L2 cache? Is there a certain look-ahead strategy it follows

Note: I'm assuming x86, but I'd be interested in knowing how other architectures works POWER, SPARC etc..

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

满身野味 2024-10-12 16:49:46

这取决于您用于 LUT(查找表?)的数据结构。

缓存在内存中连续布置(例如作为数组或 std::vectors)而不是分散放置的情况下处于最佳状态。

简单来说,当您访问内存位置时,RAM 块(x86 上的“缓存行”价值为 64 字节)会加载到缓存中,可能会逐出一些先前缓存的数据。

一般来说,缓存有几级,形成一个层次结构。每个级别的访问时间都会增加,但容量也会增加。

是的,有前瞻功能,它受到相当简单的算法和无法跨越页面边界的限制(x86 上的内存页面大小通常为 4KB。)

我建议您阅读 每个程序员都应该了解的内存知识。它有很多关于这个主题的重要信息。

It depends on the data structure you use for the LUT (look-up table?)

Caches are at their best with things that are laid out contiguously is memory (e.g. as arrays or std::vectors) rather than scattered around.

In simple terms, when you access a memory location, a block of RAM (a "cache line" worth -- 64 bytes on x86) is loaded into cache, possibly evicting some previously-cached data.

Generally, there are several levels of cache, forming a hierarchy. With each level, access times increase but so does capacity.

Yes, there is lookahead, which is limited by rather simplistic algorithms and the inability to cross page boundaries (a memory page is typically 4KB in size on x86.)

I suggest that you read What Every Programmer Should Know About Memory. It has lots of great info on the subject.

慈悲佛祖 2024-10-12 16:49:46

高速缓存通常形成为高速缓存线的集合。每个缓存行的粒度与缓存行的大小对齐,因此,例如,具有 128 字节缓存行的缓存将具有与 128 字节对齐的缓存数据地址。

CPU 高速缓存通常使用某种 LRU 逐出机制(最近最少使用,如在高速缓存未命中时逐出最旧的高速缓存行),以及具有从内存地址到特定高速缓存行集的某种映射。 (如果您尝试从 4k 或 16M 边界上对齐的多个地址读取数据,这会导致 x86 中的许多虚假共享错误之一。)

因此,当出现高速缓存未命中时,CPU 将读取内存的高速缓存行其中包括丢失的地址范围。如果您碰巧跨越缓存行边界进行读取,则意味着您将读取两个缓存行。

Caches are generally formed as a collection of cache lines. Each cache line's granularity is aligned to the size of the cache line, so, for example, a cache with a cache line of 128 bytes will have the address it is caching data for aligned to 128 bytes.

CPU caches generally use some LRU eviction mechanism (least recently used, as in evict the oldest cache line on a cache miss), as well as having some mapping from a memory address to a particular set of cache lines. (This results in one of the many false sharing errors in x86 if you are trying to read from multiple addresses aligned on a 4k or 16M boundary.)

So, when you have a cache miss, the CPU will read in a cache line of memory that includes the address range missed. If you happen to read across a cache line boundary, that means you will read in two cache lines.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文