了解CPU高速缓存和高速缓存线

发布于 2024-10-17 16:01:20 字数 684 浏览 5 评论 0原文

我试图了解 CPU 缓存是如何运行的。假设我们有这个配置(作为示例)。

  • 高速缓存大小 1024 字节
  • 高速缓存行 32 字节
  • 1024/32 = 32 条高速缓存行。
  • 单个缓存行可以存储 32/4 = 8 个整数。

1) 根据这些配置,标签长度应为 32-5=27 位,索引大小为 5 位(2^5 = 缓存行中每个字节的 32 个地址)。

如果缓存总大小为 1024,缓存行数为 32,那么标签+索引存储在哪里? (还有4*32 = 128字节。)这是否意味着缓存的实际大小是1024+128 = 1152?

2) 如果本例中缓存行为 32 字节,这意味着每当 CPU 需要从 RAM 获取新字节时,就会在缓存中复制 32 字节。我是否正确地假设请求字节的缓存行位置将由其地址确定?

这就是我的意思:如果 CPU 在 [FF FF 00 08] 处请求字节,则可用的缓存行将填充从 [FF FF 00 00] 到 <代码>[FF FF 00 1F]。我们请求的单字节将位于位置[08]

3) 如果前面的陈述是正确的,这是否意味着用于索引的 5 位在技术上是不需要的,因为无论如何所有 32 个字节都在缓存行中?

如果我有什么问题,请告诉我。 谢谢

I am trying to understand how CPU cache is operating. Lets say we have this configuration (as an example).

  • Cache size 1024 bytes
  • Cache line 32 bytes
  • 1024/32 = 32 cache lines all together.
  • Singel cache line can store 32/4 = 8 ints.

1) According to these configuration length of tag should be 32-5=27 bits, and size of index 5 bits (2^5 = 32 addresses for each byte in cache line).

If total cache size is 1024 and there are 32 cache lines, where is tags+indexes are stored? (There is another 4*32 = 128 bytes.) Does it means that actual size of the cache is 1024+128 = 1152?

2) If cache line is 32 bytes in this example, this means that 32 bytes getting copied in cache whenerever CPU need to get new byte from RAM. Am I right to assume that cache line position of the requested byte will be determined by its adress?

This is what I mean: if CPU requested byte at [FF FF 00 08], then available cache line will be filled with bytes from [FF FF 00 00] to [FF FF 00 1F]. And our requseted single byte will be at position [08].

3) If previous statement is correct, does it mean that 5 bits that used for index, are technically not needed since all 32 bytes are in the cache line anyway?

Please let me know if I got something wrong.
Thanks

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

撩起发的微风 2024-10-24 16:01:20

缓存由数据和标签 RAM 组成,其排列方式是访问时间与效率和物理布局的折衷。你错过了一个重要的统计数据:方式(组)的数量。你很少有单向缓存,因为它们在简单模式下的表现非常糟糕。无论如何:

1)是的,标签占用额外的空间。这是设计妥协的一部分 - 您不希望它占总面积的很大一部分,以及为什么行大小不仅仅是 1 个字节或 1 个字。此外,索引的所有标记都是同时访问的,如果有大量方法,这可能会影响效率和布局。尺寸比您的估计稍大。通常还有一些额外的位来标记有效性,有时还有提示。更多的路和更小的行需要标签占据更大的部分,因此通常行很大(32+字节)而路很小(4-16)。

2)是的。一些缓存还执行“关键字优先”提取,从导致行填充的单词开始,然后提取其余部分。这减少了 CPU 等待其实际请求的数据的周期数。如果您错过写入,某些缓存将“直写”并且不会分配行,这避免了在写入之前必须先读取整个缓存行(这并不总是一个胜利)。

3) 标签不会存储低 5 位,因为不需要它们来匹配缓存行。它们只是索引到单独的行中。

维基百科有一篇关于缓存的文章,虽然有点激烈,但相当不错:http://en.wikipedia.org/wiki/CPU_cache - 请参阅“实施”。有一个图表显示了数据和标签是如何分割的。我认为每个人都应该学习这些东西,因为当您知道底层机器的实际功能时,您确实可以提高代码的性能。

A cache consists of data and tag RAM, arranged as a compromise of access time vs efficiency and physical layout. You're missing an important stat: number of ways (sets). You rarely have 1-way caches, because they perform pathologically badly with simple patterns. Anyway:

1) Yes, tags take extra space. This is part of the design compromise - you don't want it to be a large fraction of the total area, and why line size isn't just 1 byte or 1 word. Also, all tags for an index are simultaneously accessed, and that can affect efficiency and layout if there's a large number of ways. The size is slightly bigger than your estimate. There's usually also a few bits extra bits to mark validity and sometimes hints. More ways and smaller lines needs a larger fraction taken up by tags, so generally lines are large (32+ bytes) and ways are small (4-16).

2) Yes. Some caches also do a "critical word first" fetch, where they start with the word that caused the line fill, then fetch the rest. This reduces the number of cycles the CPU is waiting for the data it actually asked for. Some caches will "write thru" and not allocate a line if you miss on a write, which avoids having to read the entire cache line first, before writing to it (this isn't always a win).

3) The tags won't store the lower 5 bits as they're not needed to match a cache line. They just index into individual lines.

Wikipedia has a pretty good, if a bit intense, write-up on caches: http://en.wikipedia.org/wiki/CPU_cache - see "Implementation". There's a diagram of how data and tags are split. Me, I think everyone should learn this stuff because you really can improve performance of code when you know what the underlying machine is actually capable of.

姐不稀罕 2024-10-24 16:01:20
  1. 缓存元数据通常不被视为缓存本身的一部分。它甚至可能不存储在 CPU 的同一部分中(它可能位于另一个缓存中,使用特殊的 CPU 寄存器等实现)。
  2. 这取决于您的 CPU 是否会获取未对齐的地址。如果它只获取对齐的地址,那么您给出的示例就是正确的。如果 CPU 获取未对齐的地址,则它可能会获取 0xFFFF0008 到 0xFFFF0027 范围。
  3. 即使缓存访问已对齐,索引字节仍然有用。这为 CPU 提供了一种引用高速缓存行中字节的速记方法,可以在其内部簿记中使用该字节。您可以通过了解与缓存行关联的地址和与字节关联的地址来获得相同的信息,但需要携带的信息要多得多。

不同的 CPU 实现缓存的方式非常不同。为了获得您问题的最佳答案,请提供有关您所讨论的特定 CPU(类型、型号等)的一些其他详细信息。

  1. The cache metadata is typically not counted as a part of the cache itself. It might not even be stored in the same part of the CPU (it could be in another cache, implemented using special CPU registers, etc).
  2. This depends on whether your CPU will fetch unaligned addresses. If it will only fetch aligned addresses, then the example you gave would be correct. If the CPU fetches unaligned addresses, then it might fetch the range 0xFFFF0008 to 0xFFFF0027.
  3. The index bytes are still useful, even when cache access is aligned. This gives the CPU a shorthand method for referencing a byte within a cache line that it can use in its internal bookkeeping. You could get the same information by knowing the address associated with the cache line and the address associated with the byte, but that's a whole lot more information to carry around.

Different CPUs implement caching very differently. For the best answer to your question, please give some additional details about the particular CPU (type, model, etc) that you are talking about.

假情假意假温柔 2024-10-24 16:01:20

这是基于我模糊的记忆,你应该阅读 Hennessey 和 Patterson 的《计算机体系结构:定量方法》之类的书。很棒的书。

假设一个 32 位 CPU...(否则你的数字需要使用 >4 个字节(可能 <8 个字节,因为一些/大多数 64 位 CPU 没有使用所有 64 位地址线))地址。

1)我相信它至少是4*32字节。根据 CPU 的不同,芯片架构师可能决定跟踪除完整地址之外的其他信息。但它通常不被视为缓存的一部分。

2)是的,但是映射的完成方式不同。请参阅维基百科 - CPU 缓存 - 关联性 有简单的直接映射缓存和更复杂的关联映射缓存。您希望避免某些代码需要两条信息但这两个地址映射到完全相同的缓存行的情况。

This is based on my vague memory, you should read books like "Computer Architecture: A Quantitative Approach" by Hennessey and Patterson. Great book.

Assuming a 32-bit CPU... (otherwise your figures would need to use >4 bytes (maybe <8 bytes since some/most 64-bit CPU don't have all 64 bits of address line used)) for the address.

1) I believe it's at least 4*32 bytes. Depending on the CPU, the chip architects may have decided to keep track of other info besides the full address. But it's usually not considered part of the cache.

2) Yes, but how that mapping is done is different. See Wikipedia - CPU cache - associativity There's the simple direct mapped cache and the more complex associative mapped cache. You want to avoid the case where some code needs two piece of information but the two addresses map to the exact same cache line.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文