CPU 和数据对齐

发布于 2024-09-05 12:05:36 字数 340 浏览 5 评论 0原文

如果您觉得这个问题已经被回答了很多次,请原谅我,但我需要以下问题的答案!

  1. 为什么数据必须对齐(在 2 字节/4 字节/8 字节边界上)?这里我的疑问是,当CPU有地址线Ax Ax-1 Ax-2 ... A2 A1 A0时,很有可能按顺序寻址内存位置。那么为什么需要在特定边界对齐数据?

  2. 在编译代码并生成可执行文件时如何找到对齐要求?

  3. 例如,如果数据对齐是 4 字节边界,这是否意味着每个连续字节位于模 4 偏移量处?我的疑问是,如果数据是 4 字节对齐,这是否意味着如果一个字节位于 1004,那么下一个字节位于 1008(或 1005)?

Pardon me if you feel this has been answered numerous times, but I need answers to the following queries!

  1. Why data has to be aligned (on 2-byte / 4-byte / 8-byte boundaries)? Here my doubt is when the CPU has address lines Ax Ax-1 Ax-2 ... A2 A1 A0 then it is quite possible to address the memory locations sequentially. So why there is the need to align the data at specific boundaries?

  2. How to find the alignment requirements when I am compiling my code and generating the executable?

  3. If for e.g the data alignment is 4-byte boundary, does that mean each consecutive byte is located at modulo 4 offsets? My doubt is if data is 4-byte aligned does that mean that if a byte is at 1004 then the next byte is at 1008 (or at 1005)?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(7

只是一片海 2024-09-12 12:05:36

CPU 是面向字的,而不是面向字节的。在简单的 CPU 中,内存通常配置为每个地址选通返回一个(32 位、64 位等),其中底部两个(或更多)地址线通常是无关位。

Intel CPU 可以对许多指令的非字边界执行访问,但是,由于 CPU 在内部执行两次内存访问和一次数学运算来加载一个字,因此存在性能损失。如果您正在进行字节读取,则不适用对齐。

某些 CPU(ARM 或 Intel SSE 指令)需要对齐内存,并且在执行未对齐访问(或引发异常)时具有未定义的操作。它们通过不实现更复杂的加载/存储子系统来节省大量的硅空间。

对齐取决于 CPU 字大小(16、32、64 位),或者在 SSE 的情况下取决于 SSE 寄存器大小(128 位)。

对于最后一个问题,如果您一次加载一个数据字节,则大多数 CPU 上没有对齐限制(某些 DSP 没有字节级指令,但您很可能不会遇到)。

CPUs are word oriented, not byte oriented. In a simple CPU, memory is generally configured to return one word (32bits, 64bits, etc) per address strobe, where the bottom two (or more) address lines are generally don't-care bits.

Intel CPUs can perform accesses on non-word boundries for many instructions, however there is a performance penalty as internally the CPU performs two memory accesses and a math operation to load one word. If you are doing byte reads, no alignment applies.

Some CPUs (ARM, or Intel SSE instructions) require aligned memory and have undefined operation when doing unaligned accesses (or throw an exception). They save significant silicon space by not implementing the much more complicated load/store subsystem.

Alignment depends on the CPU word size (16, 32, 64bit) or in the case of SSE the SSE register size (128 bits).

For your last question, if you are loading a single data byte at a time there is no alignment restriction on most CPUs (some DSPs don't have byte level instructions, but its likely you won't run into one).

甚是思念 2024-09-12 12:05:36

“必须”对齐的数据非常少。更重要的是,某些类型的数据可能会执行得更好,或者某些 cpu 操作需要某种数据对齐。

首先,假设您一次读取 4 个字节的数据。假设您的 CPU 有一条 32 位数据总线。假设您的数据存储在系统内存的字节 2 处。

现在,由于您可以一次加载 4 个字节的数据,因此让地址寄存器指向单个字节并没有多大意义。通过使地址寄存器指向每 4 个字节,您可以操作 4 倍的数据。换句话说,你的 CPU 可能只能读取从字节 0、4、8、12、16 等开始的数据。

这就是问题所在。如果您希望数据从字节 2 开始并且您正在读取 4 个字节,那么您的一半数据将位于地址位置 0,另一半位于位置 1。

所以基本上您最终会访问内存两次来读取您的数据4 字节数据元素。某些 CPU 不支持此类操作(或强制您手动加载并组合两个结果)。

转到此处了解更多详细信息:http://en.wikipedia.org/wiki/Data_struct_alignment

Very little data "has" to be aligned. It's more that certain types of data may perform better or certain cpu operations require a certain data alignment.

First of all, let's say you're reading 4 bytes of data at a time. Let's also say that your CPU has a 32 bit data buss. Let's also say your data is stored at byte 2 in the system memory.

Now since you can load 4 bytes of data at once, it doesn't make too much sense to have your Address register to point to a single byte. By making your address register point to every 4 bytes you can manipulate 4 times the data. So in other words your CPU may only be able to read data starting at bytes 0, 4, 8, 12, 16, etc.

So here's the issue. If you want the data starting at byte 2 and you're reading 4 bytes, then half your data will be in address position 0 and the other half in position 1.

So basically you'd end up hitting the memory twice to read your one 4 byte data element. Some CPUs don't support this sort of operation (or force you to load and combine the two results manually).

Go here for more details: http://en.wikipedia.org/wiki/Data_structure_alignment

仅冇旳回忆 2024-09-12 12:05:36

1.) 有些体系结构根本没有此要求,有些体系结构鼓励对齐(访问非对齐数据项时会产生速度损失),有些体系结构可能严格执行它(未对齐会导致处理器异常)。
当今许多流行的架构都属于速度惩罚类别。 CPU 设计者必须在灵活性/性能和成本(硅面积/总线周期所需的控制信号数量)之间进行权衡。

2.) 什么语言,什么架构?请参阅您的编译器手册和/或 CPU 架构文档。

3.) 同样,这完全取决于体系结构(某些体系结构可能根本不允许访问字节大小的项目,或者总线宽度甚至不是 8 位的倍数)。因此,除非您询问特定架构,否则您不会得到任何有用的答案。

1.) Some architectures do not have this requirement at all, some encourage alignment (there is a speed penalty when accessing non-alignet data items), and some may enforce it strictly (misaligment causes a processor exception).
Many of todays popular architectures fall in the speed penalty category. The CPU designers had to make a trade between flexibility/performance and cost (silicon area/number of control signals required for bus cycles).

2.) What language, which architecture? Consult your compilers manual and/or the CPU architecture documentation.

3.) Again this is totally architecture dependent (some architectures may not permit access on byte-sized items at all, or have bus widths which are not even a multiple of 8 bits). So unless you are asking about a specific architecture you wont get any useful answers.

丶视觉 2024-09-12 12:05:36

一般来说,所有这三个问题的一个答案是“这取决于您的系统”。更多详细信息:

  1. 您的内存系统可能无法按字节寻址。除此之外,让处理器访问未对齐的数据可能会导致性能损失。有些处理器(例如较旧的 ARM 芯片)根本无法做到这一点。

  2. 阅读您的处理器的手册以及您的代码生成的任何 ABI 规范,

  3. 通常当人们引用数据时在某种对齐方式下,它仅指第一个字节。因此,如果 ABI 规范说“数据结构 X 必须是 4 字节对齐”,则意味着 X 应该放置在内存中可被 4 整除的地址处。该声明并未暗示结构 X 的大小或内部布局.

    就您的特定示例而言,如果数据从地址 1004 开始进行 4 字节对齐,则下一个字节将位于 1005。

In general, the one answer to all three of those questions is "it depends on your system". Some more details:

  1. Your memory system might not be byte-addressable. Besides that, you might incur a performance penalty to have your processor access unaligned data. Some processors (like older ARM chips, for example) just can't do it at all.

  2. Read the manual for your processor and whatever ABI specification your code is being generated for,

  3. Usually when people refer to data being at a certain alignment, it refers only to the first byte. So if the ABI spec said "data structure X must be 4-byte aligned", it means that X should be placed in memory at an address that's divisible by 4. Nothing is implied by that statment about the size or internal layout of structure X.

    As far as your particular example goes, if the data is 4-byte aligned starting at address 1004, the next byte will be at 1005.

刘备忘录 2024-09-12 12:05:36

这完全取决于您使用的CPU!

某些架构仅处理 32(或 36!) 位字,您需要特殊指令来加载单个字符或半字。

一些 cpu(特别是 PowerPC 和其他 IBM risc 芯片)不关心对齐,而是从奇数地址加载整数。

对于大多数现代体系结构,您需要将整数与字边界对齐,将长整数与双字边界对齐。这简化了加载寄存器的电路并稍微加快了速度。

Its completely depends on the CPU you are using!

Some architectures deal only in 32 (or 36!) bit words and you need special instructions to load singel characters or haalf words.

Some cpus (notably PowerPC and other IBM risc chips) dont care about alignments and will load integers from odd addresses.

For most modern architectures you need to align integers to word boundies and long integers to double word boundries. This simplifies the circutry for loading registers and speeds things up ever so slighly.

对不⑦ 2024-09-12 12:05:36

出于性能原因,CPU 需要数据对齐。 Intel 网站详细介绍了如何对齐内存中的数据

迁移到 64 位英特尔® 架构时的数据对齐

<块引用>
<块引用>
其中之一是数据项的对齐——它们在内存中的位置与四、八或 16 字节倍数的地址相关。在16位Intel架构下,数据对齐对性能的影响很小,并且它的使用完全是可选的。在 IA-32 下,正确对齐数据可能是一项重要的优化,尽管它的使用仍然是可选的,只有极少数例外,其中正确对齐是强制性的。然而,64位环境对数据项提出了更严格的要求。未对齐的对象会导致程序异常。为了使项目正确对齐,它必须满足 64 位 Intel 架构(稍后讨论)以及用于构建应用程序的链接器的要求。

数据对齐的基本规则是最安全(且支持最广泛)的方法依赖于英特尔所说的“自然边界”。当您将数据项的大小四舍五入到下一个最大大小(两个、四个、八个或 16 个字节)时,就会发生这些情况。例如,10 字节浮点数应与 16 字节地址对齐,而 64 位整数应与 8 字节地址对齐。因为这是 64 位架构,所以指针大小都是八字节宽,因此它们也应该在八字节边界上对齐。

建议所有大于 16 字节的结构都在 16 字节边界上对齐。一般来说,为了获得最佳性能,请按如下方式对齐数据:

  • 在任意地址对齐 8 位数据
  • 将 16 位数据对齐,使其包含在对齐的四字节字中
  • 对齐 32 位数据,使其基地址为 4 的倍数
  • 对齐 64 位数据,使其基地址为 8 的倍数
  • 对齐 80 位数据,使其基地址为 16 的倍数
  • 对齐 128 位数据,使其基地址为 16 的倍数

64 字节或更大的数据结构或数组应进行对齐,以便其基地址是 64 的倍数。按大小递减顺序对数据进行排序是帮助自然对齐的一种启发式方法。只要 16 字节边界(和缓存行)永远不会交叉,自然对齐就不是绝对必要的,尽管它是强制遵守一般对齐建议的简单方法。

在结构内正确对齐数据可能会导致数据膨胀(由于正确放置字段所需的填充),因此在必要和可能的情况下,重新组织结构非常有用,以便需要最宽对齐的字段位于结构中的第一个位置。有关解决此问题的更多信息,请参阅“为 IA-64 架构准备代码(代码清理)”一文。


Data alignment is required by CPU for performance reason. Intel website give out the detail on how to align the data in the memory

Data Alignment when Migrating to 64-Bit Intel® Architecture

One of these is the alignment of data items – their location in memory in relation to addresses that are multiples of four, eight or 16 bytes. Under the 16-bit Intel architecture, data alignment had little effect on performance, and its use was entirely optional. Under IA-32, aligning data correctly can be an important optimization, although its use is still optional with a very few exceptions, where correct alignment is mandatory. The 64-bit environment, however, imposes more-stringent requirements on data items. Misaligned objects cause program exceptions. For an item to be aligned properly, it must fulfill the requirements imposed by 64-bit Intel architecture (discussed shortly), plus those of the linker used to build the application.

The fundamental rule of data alignment is that the safest (and most widely supported) approach relies on what Intel terms "the natural boundaries." Those are the ones that occur when you round up the size of a data item to the next largest size of two, four, eight or 16 bytes. For example, a 10-byte float should be aligned on a 16-byte address, whereas 64-bit integers should be aligned to an eight-byte address. Because this is a 64-bit architecture, pointer sizes are all eight bytes wide, and so they too should align on eight-byte boundaries.

It is recommended that all structures larger than 16 bytes align on 16-byte boundaries. In general, for the best performance, align data as follows:

  • Align 8-bit data at any address
  • Align 16-bit data to be contained within an aligned four-byte word
  • Align 32-bit data so that its base address is a multiple of four
  • Align 64-bit data so that its base address is a multiple of eight
  • Align 80-bit data so that its base address is a multiple of sixteen
  • Align 128-bit data so that its base address is a multiple of sixteen

A 64-byte or greater data structure or array should be aligned so that its base address is a multiple of 64. Sorting data in decreasing size order is one heuristic for assisting with natural alignment. As long as 16-byte boundaries (and cache lines) are never crossed, natural alignment is not strictly necessary, although it is an easy way to enforce adherence to general alignment recommendations.

Aligning data correctly within structures can cause data bloat (due to the padding necessary to place fields correctly), so where necessary and possible, it is useful to reorganize structures so that fields that require the widest alignment are first in the structure. More on solving this problem appears in the article "Preparing Code for the IA-64 Architecture (Code Clean)."

话少心凉 2024-09-12 12:05:36

对于英特尔架构,Intel 64 和 IA-32 架构软件开发人员手册回答您的问题 1。

For Intel Architecture, Chapter 4 DATA TYPES of Intel 64 and IA-32 Architectures Software Developer’s Manual answers your question 1.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文