CPU 和数据对齐
如果您觉得这个问题已经被回答了很多次,请原谅我,但我需要以下问题的答案!
为什么数据必须对齐(在 2 字节/4 字节/8 字节边界上)?这里我的疑问是,当CPU有地址线Ax Ax-1 Ax-2 ... A2 A1 A0时,很有可能按顺序寻址内存位置。那么为什么需要在特定边界对齐数据?
在编译代码并生成可执行文件时如何找到对齐要求?
例如,如果数据对齐是 4 字节边界,这是否意味着每个连续字节位于模 4 偏移量处?我的疑问是,如果数据是 4 字节对齐,这是否意味着如果一个字节位于 1004,那么下一个字节位于 1008(或 1005)?
Pardon me if you feel this has been answered numerous times, but I need answers to the following queries!
Why data has to be aligned (on 2-byte / 4-byte / 8-byte boundaries)? Here my doubt is when the CPU has address lines Ax Ax-1 Ax-2 ... A2 A1 A0 then it is quite possible to address the memory locations sequentially. So why there is the need to align the data at specific boundaries?
How to find the alignment requirements when I am compiling my code and generating the executable?
If for e.g the data alignment is 4-byte boundary, does that mean each consecutive byte is located at modulo 4 offsets? My doubt is if data is 4-byte aligned does that mean that if a byte is at 1004 then the next byte is at 1008 (or at 1005)?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(7)
CPU 是面向字的,而不是面向字节的。在简单的 CPU 中,内存通常配置为每个地址选通返回一个字(32 位、64 位等),其中底部两个(或更多)地址线通常是无关位。
Intel CPU 可以对许多指令的非字边界执行访问,但是,由于 CPU 在内部执行两次内存访问和一次数学运算来加载一个字,因此存在性能损失。如果您正在进行字节读取,则不适用对齐。
某些 CPU(ARM 或 Intel SSE 指令)需要对齐内存,并且在执行未对齐访问(或引发异常)时具有未定义的操作。它们通过不实现更复杂的加载/存储子系统来节省大量的硅空间。
对齐取决于 CPU 字大小(16、32、64 位),或者在 SSE 的情况下取决于 SSE 寄存器大小(128 位)。
对于最后一个问题,如果您一次加载一个数据字节,则大多数 CPU 上没有对齐限制(某些 DSP 没有字节级指令,但您很可能不会遇到)。
CPUs are word oriented, not byte oriented. In a simple CPU, memory is generally configured to return one word (32bits, 64bits, etc) per address strobe, where the bottom two (or more) address lines are generally don't-care bits.
Intel CPUs can perform accesses on non-word boundries for many instructions, however there is a performance penalty as internally the CPU performs two memory accesses and a math operation to load one word. If you are doing byte reads, no alignment applies.
Some CPUs (ARM, or Intel SSE instructions) require aligned memory and have undefined operation when doing unaligned accesses (or throw an exception). They save significant silicon space by not implementing the much more complicated load/store subsystem.
Alignment depends on the CPU word size (16, 32, 64bit) or in the case of SSE the SSE register size (128 bits).
For your last question, if you are loading a single data byte at a time there is no alignment restriction on most CPUs (some DSPs don't have byte level instructions, but its likely you won't run into one).
“必须”对齐的数据非常少。更重要的是,某些类型的数据可能会执行得更好,或者某些 cpu 操作需要某种数据对齐。
首先,假设您一次读取 4 个字节的数据。假设您的 CPU 有一条 32 位数据总线。假设您的数据存储在系统内存的字节 2 处。
现在,由于您可以一次加载 4 个字节的数据,因此让地址寄存器指向单个字节并没有多大意义。通过使地址寄存器指向每 4 个字节,您可以操作 4 倍的数据。换句话说,你的 CPU 可能只能读取从字节 0、4、8、12、16 等开始的数据。
这就是问题所在。如果您希望数据从字节 2 开始并且您正在读取 4 个字节,那么您的一半数据将位于地址位置 0,另一半位于位置 1。
所以基本上您最终会访问内存两次来读取您的数据4 字节数据元素。某些 CPU 不支持此类操作(或强制您手动加载并组合两个结果)。
转到此处了解更多详细信息:http://en.wikipedia.org/wiki/Data_struct_alignment
Very little data "has" to be aligned. It's more that certain types of data may perform better or certain cpu operations require a certain data alignment.
First of all, let's say you're reading 4 bytes of data at a time. Let's also say that your CPU has a 32 bit data buss. Let's also say your data is stored at byte 2 in the system memory.
Now since you can load 4 bytes of data at once, it doesn't make too much sense to have your Address register to point to a single byte. By making your address register point to every 4 bytes you can manipulate 4 times the data. So in other words your CPU may only be able to read data starting at bytes 0, 4, 8, 12, 16, etc.
So here's the issue. If you want the data starting at byte 2 and you're reading 4 bytes, then half your data will be in address position 0 and the other half in position 1.
So basically you'd end up hitting the memory twice to read your one 4 byte data element. Some CPUs don't support this sort of operation (or force you to load and combine the two results manually).
Go here for more details: http://en.wikipedia.org/wiki/Data_structure_alignment
1.) 有些体系结构根本没有此要求,有些体系结构鼓励对齐(访问非对齐数据项时会产生速度损失),有些体系结构可能严格执行它(未对齐会导致处理器异常)。
当今许多流行的架构都属于速度惩罚类别。 CPU 设计者必须在灵活性/性能和成本(硅面积/总线周期所需的控制信号数量)之间进行权衡。
2.) 什么语言,什么架构?请参阅您的编译器手册和/或 CPU 架构文档。
3.) 同样,这完全取决于体系结构(某些体系结构可能根本不允许访问字节大小的项目,或者总线宽度甚至不是 8 位的倍数)。因此,除非您询问特定架构,否则您不会得到任何有用的答案。
1.) Some architectures do not have this requirement at all, some encourage alignment (there is a speed penalty when accessing non-alignet data items), and some may enforce it strictly (misaligment causes a processor exception).
Many of todays popular architectures fall in the speed penalty category. The CPU designers had to make a trade between flexibility/performance and cost (silicon area/number of control signals required for bus cycles).
2.) What language, which architecture? Consult your compilers manual and/or the CPU architecture documentation.
3.) Again this is totally architecture dependent (some architectures may not permit access on byte-sized items at all, or have bus widths which are not even a multiple of 8 bits). So unless you are asking about a specific architecture you wont get any useful answers.
一般来说,所有这三个问题的一个答案是“这取决于您的系统”。更多详细信息:
您的内存系统可能无法按字节寻址。除此之外,让处理器访问未对齐的数据可能会导致性能损失。有些处理器(例如较旧的 ARM 芯片)根本无法做到这一点。
阅读您的处理器的手册以及您的代码生成的任何 ABI 规范,
通常当人们引用数据时在某种对齐方式下,它仅指第一个字节。因此,如果 ABI 规范说“数据结构 X 必须是 4 字节对齐”,则意味着 X 应该放置在内存中可被 4 整除的地址处。该声明并未暗示结构 X 的大小或内部布局.
就您的特定示例而言,如果数据从地址 1004 开始进行 4 字节对齐,则下一个字节将位于 1005。
In general, the one answer to all three of those questions is "it depends on your system". Some more details:
Your memory system might not be byte-addressable. Besides that, you might incur a performance penalty to have your processor access unaligned data. Some processors (like older ARM chips, for example) just can't do it at all.
Read the manual for your processor and whatever ABI specification your code is being generated for,
Usually when people refer to data being at a certain alignment, it refers only to the first byte. So if the ABI spec said "data structure X must be 4-byte aligned", it means that X should be placed in memory at an address that's divisible by 4. Nothing is implied by that statment about the size or internal layout of structure X.
As far as your particular example goes, if the data is 4-byte aligned starting at address 1004, the next byte will be at 1005.
这完全取决于您使用的CPU!
某些架构仅处理 32(或 36!) 位字,您需要特殊指令来加载单个字符或半字。
一些 cpu(特别是 PowerPC 和其他 IBM risc 芯片)不关心对齐,而是从奇数地址加载整数。
对于大多数现代体系结构,您需要将整数与字边界对齐,将长整数与双字边界对齐。这简化了加载寄存器的电路并稍微加快了速度。
Its completely depends on the CPU you are using!
Some architectures deal only in 32 (or 36!) bit words and you need special instructions to load singel characters or haalf words.
Some cpus (notably PowerPC and other IBM risc chips) dont care about alignments and will load integers from odd addresses.
For most modern architectures you need to align integers to word boundies and long integers to double word boundries. This simplifies the circutry for loading registers and speeds things up ever so slighly.
出于性能原因,CPU 需要数据对齐。 Intel 网站详细介绍了如何对齐内存中的数据
迁移到 64 位英特尔® 架构时的数据对齐
Data alignment is required by CPU for performance reason. Intel website give out the detail on how to align the data in the memory
Data Alignment when Migrating to 64-Bit Intel® Architecture
对于英特尔架构,Intel 64 和 IA-32 架构软件开发人员手册回答您的问题 1。
For Intel Architecture, Chapter 4 DATA TYPES of Intel 64 and IA-32 Architectures Software Developer’s Manual answers your question 1.