C代码分析

发布于 2024-09-25 17:37:29 字数 588 浏览 2 评论 0原文

这是我在 64 位 Linux 机器上编写的函数。

void myfunc(unsigned char* arr) //array of 8 bytes is passed by reference
{
   unsigned long a = 0; //8 bytes
   unsigned char* LL = (unsigned char*) &a;

   LL[0] = arr[6];
   LL[1] = arr[3];
   LL[2] = arr[1];
   LL[3] = arr[7];
   LL[4] = arr[5];
   LL[5] = arr[4];
   LL[6] = arr[0];
   LL[7] = arr[2];
}

现在我的问题是:

  1. 变量“a”是否会存储在寄存器中,以便不会从 RAM 或 chache 一次又一次地访问它?
  2. 在 64 位架构上工作,我是否应该假设“arr”数组将存储在寄存器中,因为函数参数存储在 64 位架构中的寄存器中?
  3. 指针类型转换的效率如何?我的猜测是它应该效率很低?

任何帮助将不胜感激。

问候

Here is function that i am writing on 64 bit linux machine.

void myfunc(unsigned char* arr) //array of 8 bytes is passed by reference
{
   unsigned long a = 0; //8 bytes
   unsigned char* LL = (unsigned char*) &a;

   LL[0] = arr[6];
   LL[1] = arr[3];
   LL[2] = arr[1];
   LL[3] = arr[7];
   LL[4] = arr[5];
   LL[5] = arr[4];
   LL[6] = arr[0];
   LL[7] = arr[2];
}

Now my questions are:

  1. Will variable 'a' be stored in a register so that It wont be accessed again and again from RAM or chache?
  2. Working on 64 bit architecture, should I assume that 'arr' array will be stored in a register as functions parameters are stored in a register in 64 bit arch?
  3. How efficient is Pointer type casting? my guess is that It should be inefficient at all?

Any help would be appriciated.

Regards

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

岁吢 2024-10-02 17:37:30
  1. a 无法存储在寄存器中,因为您已经获取了它的地址。 (瓦尔多正确地指出,一个真正智能的编译器可以将数组访问优化为位操作并将a留在寄存器中,但我从未见过编译器这样做,而且我不确定它最终会更快)。
  2. arr(指针本身)存储在寄存器中(%edi,在 amd64 上)。数组的内容位于内存中。
  3. 指针类型转换本身通常根本不生成任何代码。然而,使用类型转换做一些愚蠢的事情可能会导致代码效率非常低,甚至导致代码的行为未定义。

看起来您正在尝试排列数组中的字节,然后将它们推入一个数字,并且您的示例生成的机器代码对此来说并不是非常糟糕。 David 建议使用移位和掩码操作,这很好(如果您的代码需要在大端机器上运行,这也可以避免出现问题),而且还有 SSE 向量排列指令,但我听说它们很友善使用起来很痛苦。

顺便说一句,您应该将示例函数的返回类型设为unsigned long,并将return a;放在最后;然后您可以使用 gcc -O2 -S 并准确查看编译后得到的结果。如果不更改 return a,GCC 将愉快地优化函数的整个主体,因为它没有外部可见的副作用。

  1. a cannot be stored in a register, as you have taken its address. (valdo correctly points out that a really smart compiler could optimize the array accesses into bit operations and leave a in a register, but I've never seen a compiler do that, and I'm not sure it would wind up being faster).
  2. arr (the pointer itself) is stored in a register (%edi, on amd64). The contents of the array are in memory.
  3. Pointer type casting by itself often generates no code at all. However, doing silly things with type casts can lead to very inefficient code, or even to code whose behavior is undefined.

It looks like you are trying to permute the bytes in an array and then shove them into a number, and the machine code your example generates is not terribly bad for that. David's suggestion to use shift and mask operations instead is good (this will also avoid problems if your code ever needs to run on a big-endian machine), and there are also the SSE vector permute instructions, but I have heard they're kind of a pain to use.

Incidentally, you should make the return type of your example function be unsigned long and put return a; at the very end; then you can use gcc -O2 -S and see exactly what you get from compilation. Without the change to return a, GCC will cheerfully optimize away the entire body of the function, since it has no externally visible side effects.

桜花祭 2024-10-02 17:37:30

您可能会更好地使用显式移位和掩码指令来完成此操作,而不是使用数组索引。

数组操作将使编译器更难使用寄存器来执行此操作,因为通常没有指令执行“从寄存器 A 的第三个字节加载 8 位”之类的操作。 (优化编译器可以发现可以通过移位/掩码来做到这一点,但我不确定这种可能性有多大)。

You might do better to use explicit shift and mask instructions to accomplish this, instead of using array indexing.

The array operations are going to make it harder for the compiler to use registers for this, because there typically are not instructions that do things like "load 8 bits from the 3rd byte of register A". (An optimizing compiler could figure out that it's possible to do this with shifts/masks, but I'm not sure how likely that is).

始终不够爱げ你 2024-10-02 17:37:30
  1. 关于变量a是否存储在寄存器中的问题是一个优化问题。由于没有 volatile 修饰符,恕我直言,智能编译器会执行此操作。

  2. 这是调用约定的问题。如果按照惯例,单个指针参数在寄存器中传输 - 那么将是 arr

  3. 指针类型转换不是 CPU 解释的操作。没有为其生成代码。它只是向编译器提供有关您的意思的信息。

(实际上有时强制转换确实会产生额外的代码,但这与多重继承和多态性有关)

  1. The question about if the variable a will be stored in the register is a matter of optimization. Since there's no volatile modifier IMHO a smart compiler will do this.

  2. It's a question of the calling convention. If by convention a single pointer parameter is transferred in a register - so will be arr.

  3. Pointer type casting is not an operation that CPU interprets. There's no code generated for it. It just the information for the compiler about what do you mean.

(Actually sometimes casting does produce extra code, but this is related to multiple inheritance and polymorphism)

慢慢从新开始 2024-10-02 17:37:30

取决于您的优化级别。您可以检查程序集来回答您的问题。对于 gcc,请使用“-S”标志。

gcc -S -O0 -o /tmp/xx-O0.s /tmp/xx.c
gcc -S -O3 -o /tmp/xx-O3.s /tmp/xx.c

生成的程序集完全不同。 (请务必进行 Zack 建议的 return a; 更改。)

另请参阅此消息以获取提示关于如何生成混合的 C/汇编列表(经过优化后很快就会变得毫无用处)。

Depends on your optimization level. You can examine the assembly to answer your questions. With gcc, use the "-S" flag.

gcc -S -O0 -o /tmp/xx-O0.s /tmp/xx.c
gcc -S -O3 -o /tmp/xx-O3.s /tmp/xx.c

The generated assembly is completely different. (Be sure to make the return a; change suggested by Zack.)

See also this message for hints on how to generate a mixed c/assembly listing (which quickly becomes useless with optimization).

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文