为什么CPU要按字边界访问内存？

发布于 2024-09-18 04:21:19 字数 681 浏览 14 评论 0原文

我经常听说数据应该在内存中正确对齐以获得更好的访问效率。 CPU 以字边界访问内存。

因此，在以下场景中，CPU 必须进行 2 次内存访问才能获取单个字。

Supposing: 1 word = 4 bytes

("|" stands for word boundary. "o" stands for byte boundary)


|----o----o----o----|----o----o----o----|   (The word boundary in CPU's eye)
           ----o----o----o----              (What I want to read from memory)

为什么会发生这种情况？ CPU只能在字边界读取的根本原因是什么？

如果CPU只能在4字节字边界读取，那么地址线应该只需要30bit，而不是32bit宽度。因为最后2bit在CPU眼中始终为0。

ADD 1

更重要的是，如果我们承认CPU必须在字边界处读取，为什么边界不能从我要读取的地方开始？在CPU眼中，边界似乎是固定的。

ADD 2

根据AnT，边界设置似乎是硬连线的，并且是由内存访问硬件硬连线的。就这一点而言，CPU 是无辜的。

原文

I heard a lot that data should be properly aligned in memory for better access efficiency. CPU access memory on a word boundary.

So in the following scenario, the CPU has to make 2 memory accesses to get a single word.

Supposing: 1 word = 4 bytes

("|" stands for word boundary. "o" stands for byte boundary)


|----o----o----o----|----o----o----o----|   (The word boundary in CPU's eye)
           ----o----o----o----              (What I want to read from memory)

Why should this happen? What's the root cause of the CPU can only read at the word boundary?

If the CPU can only access at the 4-byte word boundary, the address line should only need 30bit, not 32bit width. Cause the last 2bit are always 0 in CPU's eye.

ADD 1

And even more, if we admit that CPU must read at the word boundary, why can't the boundary start at where I want to read? It seems that the boundary is fixed in CPU's eye.

ADD 2

According to AnT, it seems that the boundary setting is hardwired and it is hardwired by the memory access hardware. CPU is just innocent as far as this is concerned.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

深空失忆 2024-09-25 04:21:19

在这种情况下，“可以”（在“...CPU可以访问...”中）的含义取决于硬件平台。

在 x86 平台上，CPU 指令可以访问绝对任何边界上对齐的数据，而不仅仅是“字边界”上对齐的数据。未对齐的访问可能比对齐的访问效率低，但其原因与 CPU 完全无关。它与底层低级内存访问硬件的工作方式密切相关。在这种情况下，与内存相关的硬件很可能必须对实际内存进行两次访问，但这是 CPU 指令不知道也不需要知道的事情。就CPU而言，它可以访问任何边界上的任何数据。其余部分对 CPU 指令透明地实现。

在像Sun SPARC这样的硬件平台上，CPU无法访问未对齐的数据（简单来说，如果你尝试这样做，你的程序将会崩溃），这意味着如果由于某种原因你需要执行这种未对齐的访问，您必须手动且显式地实现它：将其拆分为两个（或更多）CPU 指令，从而显式执行两个（或更多）内存访问。

至于为什么会这样……嗯，这就是现代计算机内存硬件的工作原理。数据必须对齐。如果未对齐，访问要么效率较低，要么根本不起作用。

现代内存的一个非常简化的模型是一个单元网格（行和列），每个单元存储一个数据字。可编程机械臂可以将单词放入特定单元中并从特定单元中检索单词。一次一个。如果您的数据分布在多个单元中，您别无选择，只能使用机械臂进行多次连续旅行。在某些硬件平台上，组织这些连续行程的任务对 CPU 是隐藏的（这意味着手臂本身知道如何从多个部分组装必要的数据），在其他平台上，它对 CPU 是可见的（这意味着它是CPU 负责组织手臂的这些连续行程）。

The meaning of "can" (in "...CPU can access...") in this case depends on the hardware platform.

On x86 platform CPU instructions can access data aligned on absolutely any boundary, not only on "word boundary". The misaligned access might be less efficient than aligned access, but the reasons for that have absolutely nothing to do with CPU. It has everything to do with how the underlying low-level memory access hardware works. It is quite possible that in this case the memory-related hardware will have to make two accesses to the actual memory, but that's something CPU instructions don't know about and don't need to know about. As far as CPU is concerned, it can access any data on any boundary. The rest is implemented transparently to CPU instructions.

On hardware platforms like Sun SPARC, CPU cannot access misaligned data (in simple words, your program will crash if you attempt to), which means that if for some reason you need to perform this kind of misaligned access, you'll have to implement it manually and explicitly: split it into two (or more) CPU instructions and thus explicitly perform two (or more) memory accesses.

As for why it is so... well, that's just how modern computer memory hardware works. The data has to be aligned. If it is not aligned, the access either is less efficient or does not work at all.

A very simplified model of modern memory would be a grid of cells (rows and columns), each cell storing a word of data. A programmable robotic arm can put a word into a specific cell and retrieve a word from a specific cell. One at a time. If your data is spread across several cells, you have no other choice but to make several consecutive trips with that robotic arm. On some hardware platforms the task of organizing these consecutive trips is hidden from CPU (meaning that the arm itself knows what to do to assemble the necessary data from several pieces), on other platforms it is visible to the CPU (meaning that it is the CPU who's responsible for organizing these consecutive trips of the arm).

回复收藏 0 原文

赴月观长安 2024-09-25 04:21:19

如果您可以对地址做出某些假设（例如“底部 n 位为零”），那么它可以节省寻址逻辑中的芯片。某些 CPU（x86 及其类似产品）会将逻辑放在适当的位置以进行转换将数据错位放入多个读取中，从而向程序员隐藏了一些令人讨厌的性能影响，而该世界之外的大多数 CPU 都会发出一个硬件错误，以明确的方式解释它们不喜欢这种情况

。 “效率”是胡说八道，或者更准确地说是回避问题，真正的原因很简单，如果可以减少操作的地址位数（例如在x86 世界）是硬件设计决策的结果，而不是一般寻址的结果，

话虽如此，对于大多数用例，如果您以两字节字访问数据，那么硬件设计决策是有意义的。在某些情况下，您需要访问 offset，然后访问 offset+2，然后访问 offset+4 等等。在访问两字节字时能够按字节递增地址通常（99.44％肯定如此）不是您想要做的。因此，要求地址偏移量在字边界上对齐并没有什么坏处（当您设计数据结构时，这是一种轻微的、一次性的不便），但它确实节省了您的芯片。

抛开历史不谈，我曾经在 Interdata Model 70——一台 16 位小型机上工作过。它要求所有内存访问都是 16 位对齐的。按照当时的标准，当我处理它时，它的内存量也非常小。（即使在当时，这也是一个遗迹。）字对齐用于使内存容量加倍，因为绕线的 CPU 很容易被黑客攻击。添加了新的地址解码逻辑，该逻辑将地址的低位取为 1（之前是对齐错误），并使用它切换到第二个内存组。尝试不使用对齐逻辑！ :)

回复收藏 0 原文