可以 gcc/g++告诉我它何时忽略我的寄存器？

发布于 2024-09-14 19:35:53 字数 495 浏览 2 评论 0原文

使用gcc/g++编译C/C++代码时，如果它忽略我的寄存器，它能告诉我吗？例如，在此代码中，

int main()
{
    register int j;
    int k;
    for(k = 0; k < 1000; k++)
        for(j = 0; j < 32000; j++)
            ;
    return 0;
}

j 将用作寄存器，但在此代码中，

int main()
{
    register int j;
    int k;
    for(k = 0; k < 1000; k++)
        for(j = 0; j < 32000; j++)
            ;
    int * a = &j;
    return 0;
}

j 将是普通变量。它能告诉我我使用寄存器的变量是否真的存储在CPU寄存器中吗？

原文

When compiling C/C++ codes using gcc/g++, if it ignores my register, can it tell me?
For example, in this code

int main()
{
    register int j;
    int k;
    for(k = 0; k < 1000; k++)
        for(j = 0; j < 32000; j++)
            ;
    return 0;
}

j will be used as register, but in this code

int main()
{
    register int j;
    int k;
    for(k = 0; k < 1000; k++)
        for(j = 0; j < 32000; j++)
            ;
    int * a = &j;
    return 0;
}

j will be a normal variable.
Can it tell me whether a variable I used register is really stored in a CPU register?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

二智少女猫性小仙女 2024-09-21 19:35:53

您可以合理地假设 GCC 会忽略 register 关键字，除了 -O0 处的关键字。然而，它不应该以某种方式产生影响，如果你已经深入了解了，你应该已经在阅读汇编代码了。

以下是有关此主题的信息线程： http://gcc.gnu .org/ml/gcc/2010-05/msg00098.html 。回到过去，register 确实帮助编译器将变量分配到寄存器中，但现在寄存器分配可以自动最佳地完成，无需提示。该关键字在 C 中仍然有两个用途：

在 C 中，它阻止您获取变量的地址。由于寄存器没有地址，因此此限制可以帮助简单的 C 编译器。（简单的 C++ 编译器不存在。）
register 对象不能声明为 restrict。因为 restrict 与地址有关，所以它们的交集是没有意义的。（C++ 还没有 restrict，无论如何，这个规则有点微不足道。）

对于 C++，该关键字自 C++11 和建议从 2017 年计划的标准修订版中删除。

一些编译器已使用 在参数声明上注册以确定函数的调用约定，ABI 允许混合基于堆栈和寄存器的参数。这似乎不符合规范，它往往发生在像 register("A1") 这样的扩展语法中，而且我不知道是否仍有这样的编译器在使用。

回复收藏 0 原文

栀子花开つ 2024-09-21 19:35:53

就现代编译和优化技术而言，register 注释根本没有任何意义。在第二个程序中，您获取j的地址，并且寄存器没有地址，但是同一个本地或静态变量在其生命周期内可以完美地存储在两个不同的内存位置中，或者有时存储在内存中有时在寄存器中，或者根本不存在。事实上，优化编译器会将嵌套循环编译为空，因为它们没有任何效果，并简单地将其最终值分配给 k 和 j。然后省略这些赋值，因为其余代码不使用这些值。

回复收藏 0 原文

吾家有女初长成 2024-09-21 19:35:53

在 C 语言中你无法获取寄存器的地址，而且编译器可以完全忽略你； C99 标准，第 6.7.1 节 (pdf)：

该实现可以处理任何
将声明简单地注册为 auto
宣言。然而，无论是否
实际使用的是可寻址存储，
对象的任何部分的地址
使用存储类说明符声明
寄存器也无法计算
显式地（通过使用一元 &
6.5.3.2 中讨论的运算符）或
隐式（通过转换数组
如中讨论的指针的名称
6.3.2.1）。因此，唯一可以应用于声明的数组的运算符
带存储类说明符寄存器
是sizeof。

除非您正在研究 8 位 AVR 或 PIC，否则编译器可能会嘲笑您，认为您最了解并忽略您的请求。即使在它们上，我也认为我知道得更好了几次，并找到了欺骗编译器的方法（使用一些内联汇编），但我的代码爆炸了，因为它必须处理一堆其他数据来解决我的固执。

回复收藏 0 原文

微暖i 2024-09-21 19:35:53

这个问题，以及一些答案，以及我见过的“寄存器”关键字的其他一些讨论——似乎隐含地假设所有本地变量都映射到特定的寄存器，或映射到堆栈上的特定内存位置。直到 15-25 年前，这通常是正确的，如果你关闭优化，这也是正确的，但根本不是正确的
当执行标准优化时。现在，优化器将局部变量视为用于描述数据流的符号名称，而不是需要存储在特定位置的值。

注意：这里的“局部变量”是指：存储类自动（或“寄存器”）的标量变量，它们从不用作“&”的操作数。编译器有时也可以将自动结构、联合或数组分解为单独的“局部”变量。

为了说明这一点：假设我在函数的顶部写下这个：

int factor = 8;

..然后 factor 变量的唯一用途是乘以各种东西：

arr[i + factor*j] = arr[i - factor*k];

在这种情况下 - 如果你想尝试一下 -不会有 factor 变量。代码分析将显示 factor 始终为 8，因此所有移位都将变为 <<3。如果你在 1985 C 中做了同样的事情，factor 将在堆栈上获得一个位置，并且会有乘数，因为编译器基本上一次只处理一个语句，并且不记得任何关于变量的值。当时，程序员更有可能在这种情况下使用#define Factor 8 来获得更好的代码，同时保持可调整的因子。

如果您使用-O0（优化关闭） - 您确实会获得一个factor变量。例如，这将允许您跳过 factor=8 语句，然后使用调试器将 factor 更改为 11，然后继续。为了实现这一点，编译器不能在语句之间的寄存器中保存任何东西，除了分配给特定寄存器的变量；在这种情况下，调试器会被告知这一点。而且它不能尝试“了解”有关变量值的任何信息，因为调试器可以更改它们。换句话说，如果您想在调试时更改局部变量，则需要 1985 年的情况。

现代编译器通常按如下方式编译函数：

(1) 当局部变量在函数中被多次赋值时，编译器会创建该变量的不同“版本”，以便每个变量仅被赋值在一处。变量的所有“读取”都引用特定版本。

(2) 每个本地变量都被分配到一个“虚拟”寄存器。中间计算结果也被赋值给变量/寄存器；所以

  a = b*c + 2*k;

变成类似

       t1 = b*c;
       t2 = 2;
       t3 = k*t2;
       a = t1 + t3;

(3) 然后编译器接受所有这些操作，并查找公共子表达式等。由于每个新寄存器只被写入一次，因此在保持正确性的同时重新排列它们会更容易。我什至不会开始循环分析。

(4) 然后编译器尝试将所有这些虚拟寄存器映射到实际寄存器中以生成代码。由于每个虚拟寄存器的生命周期有限，因此可以大量重用实际寄存器 - 仅在生成“a”的加法之前才需要上面的“t1”，因此它可以与“a”保存在同一寄存器中。当没有足够的寄存器时，可以将一些虚拟寄存器分配给内存——或者——可以将一个值保存在某个寄存器中，存储到内存一段时间，然后再加载回（可能）不同的寄存器中。在加载存储机器上，只有寄存器中的值可以用于计算，第二种策略可以很好地适应这一点。

从上面的内容应该可以清楚地看出：很容易确定映射到 factor 的虚拟寄存器与常量“8”相同，因此所有与 factor 的乘法是乘以 8。即使后来修改了factor，这也是一个“新”变量，并且不会影响之前使用factor。

另一个含义是，如果你写

 vara = varb;

..，代码中可能有也可能没有相应的副本。例如，

int *resultp= ...
int acc = arr[0] + arr[1];
int acc0 = acc;    // save this for later
int more = func(resultp,3)+ func(resultp,-3);
acc += more;         // add some more stuff
if( ...){
    resultp = getptr();
    resultp[0] = acc0;
    resultp[1] = acc;
}

在上面的 acc 的两个“版本”（初始的，以及添加“更多”之后）可以位于两个不同的寄存器中，并且“acc0”将与初始的“acc”相同。因此“acc0 =acc”不需要寄存器副本。
另一点：“resultp”被分配了两次，并且由于第二次分配忽略了前一个值，因此代码中本质上有两个不同的“resultp”变量，这很容易通过分析确定。

所有这一切的含义是：如果可以使代码更易于理解，请毫不犹豫地使用额外的局部变量将复杂的表达式分解为更小的表达式。这样做的运行时间损失基本上为零，因为优化器无论如何都会看到同样的事情。

如果您有兴趣了解更多信息，可以从这里开始：http://en.wikipedia.org/wiki/ Static_single_assignment_form

这个答案的要点是（a）给出一些现代编译器如何工作的想法，以及（b）指出要求编译器（如果愿意的话）将特定的局部变量放入寄存器中——实在没有道理。每个“变量”都可以被优化器视为多个变量，其中一些可能在循环中大量使用，而另一些则不会。有些变量会消失——例如，通过保持不变；或者，有时，交换中使用的临时变量。或者没有实际使用的计算。编译器可以根据您正在编译的机器上的最佳实际情况，对代码不同部分的不同内容使用相同的寄存器。

提示编译器哪些变量应位于寄存器中的概念假设每个局部变量映射到寄存器或内存位置。当 Kernighan + Ritchie 设计 C 语言时确实如此，但现在不再如此了。

关于不能获取寄存器变量的地址的限制：显然，没有办法实现获取寄存器中保存的变量的地址，但您可能会问 - 因为编译器可以自行决定忽略“寄存器” - 为什么要制定这条规则？如果我碰巧获取了地址，为什么编译器不能忽略“寄存器”？（就像 C++ 中的情况）。

同样，你必须回到旧的编译器。最初的 K+R 编译器会解析局部变量声明，然后立即决定是否将其分配给寄存器（如果是，则分配给哪个寄存器）。然后它将继续编译表达式，一次为每个语句发出一个汇编程序。如果后来发现您正在获取已分配给寄存器的“寄存器”变量的地址，则无法处理该问题，因为那时分配通常是不可逆的。但是，有可能生成错误消息并停止编译。

最重要的是，“register”基本上已经过时了：

C++ 编译器完全忽略它
C 编译器忽略它，除了强制执行关于 & 的限制 - 并且可能不会在 处忽略它 - O0 ，它实际上可以导致按请求进行分配。不过，在 -O0 时，您并不关心代码速度。

因此，它现在基本上是为了向后兼容，并且可能是基于某些实现仍然可以将它用于“提示”。我从不使用它——我编写实时 DSP 代码，并花费大量时间查看生成的代码并寻找使其更快的方法。有很多方法可以修改代码以使其运行得更快，了解编译器的工作原理非常有帮助。自从我上次发现添加“注册”成为这些方法以来，确实已经有很长一段时间了。

附录

我在上面从我对“局部变量”的特殊定义中排除了应用 & 的变量（这些变量当然包含在该术语的通常含义中）。

考虑下面的代码：

void
somefunc()
{
    int h,w;
    int i,j;
    extern int pitch;

    get_hw( &h,&w );  // get shape of array

    for( int i = 0; i < h; i++ ){
        for( int j = 0; j < w; j++ ){
            Arr[i*pitch + j] = generate_func(i,j);
        }
    }
}

这可能看起来完全无害。但如果您关心执行速度，请考虑以下情况：编译器将 h 和 w 的地址传递给 get_hw，然后调用generate_func。我们假设编译器对这些函数中的内容一无所知（这是一般情况）。编译器必须假设对generate_func的调用可能会改变h或w。这是传递给 get_hw 的指针的完全合法的用法 - 您可以将其存储在某处，然后在以后使用它，只要包含 h,w 的范围仍在发挥作用，读取或写入这些变量。

因此，编译器必须将 h 和 w 存储在堆栈的内存中，并且无法提前确定循环将运行多长时间。因此，某些优化是不可能的，并且循环的效率可能会降低（在这个例子中，无论如何，内部循环中有一个函数调用，所以它可能不会有太大的区别，但请考虑有一个函数的情况它偶尔在内循环中被调用，具体取决于某些条件）。

这里的另一个问题是 generate_func 可能会改变 pitch，因此每次都需要执行 i*pitch，而不是仅在 i 时执行改变。

它可以重新编码为：

void
somefunc()
{
    int h0,w0;
    int h,w;
    int i,j;
    extern int pitch;
    int apit = pitch;

    get_hw( &h0,&w0 );  // get shape of array
    h= h0;
    w= w0;

    for( int i = 0; i < h; i++ ){
        for( int j = 0; j < w; j++ ){
            Arr[i*apit + j] = generate_func(i,j);
        }
    }
}

现在，变量 apit,h,w 都是我上面定义的意义上的“安全”局部变量，并且编译器可以确保它们不会被任何函数调用更改。假设我没有修改generate_func中的任何内容，代码将具有与以前相同的效果，但可能会更高效。

Jens Gustedt 建议使用“register”关键字作为标记关键变量的一种方式，以禁止在它们上使用 &，例如其他维护代码的人（它不会影响生成的代码，因为编译器可以在没有它的情况下确定缺少 &）。就我而言，在将 & 应用于代码的时间关键区域中的任何本地标量之前，我总是仔细考虑，并且在我看来，使用“register”来强制执行此操作有点神秘，但是我可以明白这一点（不幸的是它在 C++ 中不起作用，因为编译器只会忽略“寄存器”）。

顺便说一句，就代码效率而言，让函数返回两个值的最佳方法是使用结构体：

struct hw {  // this is what get_hw returns
   int h,w;
};

void
somefunc()
{
    int h,w;
    int i,j;

    struct hw hwval = get_hw();  // get shape of array
    h = hwval.h;
    w = hwval.w;
    ...

这可能看起来很麻烦（并且编写起来很麻烦），但它会生成比前面的示例更干净的代码。 “struct hw”实际上会在两个寄存器中返回（无论如何在大多数现代 ABI 上）。由于“hwval”结构的使用方式，优化器将有效地将其分解为两个“局部变量”hwval.h 和 hwval.w，然后确定这些相当于 h 和 w - 所以 hwval 基本上会在代码中消失。不需要传递指针，没有函数通过指针修改另一个函数的变量；就像有两个不同的标量返回值一样。现在在 C++11 中，使用 std::tie 和 std::tuple 可以更容易地做到这一点，您可以以更少的冗长方式使用此方法（并且无需编写结构体定义）。

This question, and some of the answers, and several other discussions of the 'register' keywords I've seen -- seem to assume implicitly that all locals are mapped either to a specific register, or to a specific memory location on the stack. This was generally true until 15-25 years ago, and it's true if you turn off optimizing, but it's not true at all
when standard optimizing is performed. Locals are now seen by optimizers as symbolic names that you use to describe the flow of data, rather than as values that need to be stored in specific locations.

Note: by 'locals' here I mean: scalar variables, of storage class auto (or 'register'), which are never used as the operand of '&'. Compilers can sometimes break up auto structs, unions or arrays into individual 'local' variables, too.

To illustrate this: suppose I write this at the top of a function:

int factor = 8;

.. and then the only use of the factor variable is to multiply by various things:

arr[i + factor*j] = arr[i - factor*k];

In this case - try it if you want - there will be no factor variable. The code analysis will show that factor is always 8, and so all the shifts will turn into <<3. If you did the same thing in 1985 C, factor would get a location on the stack, and there would be multipilies, since the compilers basically worked one statement at a time and didn't remember anything about the values of the variables. Back then programmers would be more likely to use #define factor 8 to get better code in this situation, while maintaining adjustable factor.

If you use -O0 (optimization off) - you will indeed get a variable for factor. This will allow you, for instance, to step over the factor=8 statement, and then change factor to 11 with the debugger, and keep going. In order for this to work, the compiler can't keep anything in registers between statements, except for variables which are assigned to specific registers; and in that case the debugger is informed of this. And it can't try to 'know' anything about the values of variables, since the debugger could change them. In other words, you need the 1985 situation if you want to change local variables while debugging.

Modern compilers generally compile a function as follows:

(1) when a local variable is assigned to more than once in a function, the compiler creates different 'versions' of the variable so that each one is only assigned in one place. All of the 'reads' of the variable refer to a specific version.

(2) Each of these locals is assigned to a 'virtual' register. Intermediate calculation results are also assigned variables/registers; so

  a = b*c + 2*k;

becomes something like

       t1 = b*c;
       t2 = 2;
       t3 = k*t2;
       a = t1 + t3;

(3) The compiler then takes all these operations, and looks for common subexpressions, etc. Since each of the new registers is only ever written once, it is rather easier to rearrange them while maintaining correctness. I won't even start on loop analysis.

(4) The compiler then tries to map all these virtual registers into actual registers in order to generate code. Since each virtual register has a limited lifetime it is possible to reuse actual registers heavily - 't1' in the above is only needed until the add which generates 'a', so it could be held in the same register as 'a'. When there are not enough registers, some of the virtual registers can be allocated to memory -- or -- a value can be held in a certain register, stored to memory for a while, and loaded back into a (possibly) different register later. On a load-store machine, where only values in registers can be used in computations, this second strategy accomodates that nicely.

From the above, this should be clear: it's easy to determine that the virtual register mapped to factor is the same as the constant '8', and so all multiplications by factor are multiplications by 8. Even if factor is modified later, that's a 'new' variable and it doesn't affect prior uses of factor.

Another implication, if you write

 vara = varb;

.. it may or may not be the case that there is a corresponding copy in the code. For instance

int *resultp= ...
int acc = arr[0] + arr[1];
int acc0 = acc;    // save this for later
int more = func(resultp,3)+ func(resultp,-3);
acc += more;         // add some more stuff
if( ...){
    resultp = getptr();
    resultp[0] = acc0;
    resultp[1] = acc;
}

In the above the two 'versions' of acc (initial, and after adding 'more') could be in two different registers, and 'acc0' would then be the same as the inital 'acc'. So no register copy would be needed for 'acc0 =acc'.
Another point: the 'resultp' is assigned to twice, and since the second assignment ignores the previous value, there are essentially two distinct 'resultp' variables in the code, and this is easily determined by analysis.

An implication of all this: don't be hesitant to break out complex expressions into smaller ones using additional locals for intermediates, if it makes the code easier to follow. There is basically zero run-time penalty for this, since the optimizer sees the same thing anyway.

If you're interested in learning more you could start here: http://en.wikipedia.org/wiki/Static_single_assignment_form

The point of this answer is to (a) give some idea of how modern compilers work and (b) point out that asking the compiler, if it would be so kind, to put a particular local variable into a register -- doesn't really make sense. Each 'variable' may be seen by the optimizer as several variables, some of which may be heavily used in loops, and others not. Some variables will vanish -- e.g. by being constant; or, sometimes, the temp variable used in a swap. Or calculations not actually used. The compiler is equipped to use the same register for different things in different parts of the code, according to what's actually best on the machine you are compiling for.

The notion of hinting the compiler as to which variables should be in registers assumes that each local variable maps to a register or to a memory location. This was true back when Kernighan + Ritchie designed the C language, but is not true any more.

Regarding the restriction that you can't take the address of a register variable: Clearly, there's no way to implement taking the address of a variable held in a register, but you might ask - since the compiler has discretion to ignore the 'register' - why is this rule in place? Why can't the compiler just ignore the 'register' if I happen to take the address? (as is the case in C++).

Again, you have to go back to the old compiler. The original K+R compiler would parse a local variable declaration, and then immediately decide whether to assign it to a register or not (and if so, which register). Then it would proceed to compile expressions, emitting the assembler for each statement one at a time. If it later found that you were taking the address of a 'register' variable, which had been assigned to a register, there was no way to handle that, since the assignment was, in general, irreversible by then. It was possible, however, to generate an error message and stop compiling.

Bottom line, it appears that 'register' is essentially obsolete:

C++ compilers ignore it completely
C compilers ignore it except to enforce the restriction about & - and possibly don't ignore it at -O0 where it could actually result in allocation as requested. At -O0 you aren't concerned about code speed though.

So, it's basically there now for backward compatibility, and probably on the basis that some implementations could still be using it for 'hints'. I never use it -- and I write real-time DSP code, and spend a fair bit of time looking at generated code and finding ways to make it faster. There are many ways to modify code to make it run faster, and knowing how compilers work is very helpful. It's been a long time indeed since I last found that adding 'register' to be among those ways.

Addendum

I excluded above, from my special definition of 'locals', variables to which & is applied (these are are of course included in the usual sense of the term).

Consider the code below:

void
somefunc()
{
    int h,w;
    int i,j;
    extern int pitch;

    get_hw( &h,&w );  // get shape of array

    for( int i = 0; i < h; i++ ){
        for( int j = 0; j < w; j++ ){
            Arr[i*pitch + j] = generate_func(i,j);
        }
    }
}

This may look perfectly harmless. But if you are concerned about execution speed, consider this: The compiler is passing the addresses of h and w to get_hw, and then later calling generate_func. Let's assume the compiler knows nothing about what's in those functions (which is the general case). The compiler must assume that the call to generate_func could be changing h or w. That's a perfectly legal use of the pointer passed to get_hw - you could store it somewhere and then use it later, as long as the scope containing h,w is still in play, to read or write those variables.

Thus the compiler must store h and w in memory on the stack, and can't determine anything in advance about how long the loop will run. So certain optimizations will be impossible, and the loop could be less efficient as a result (in this example, there's a function call in the inner loop anyway, so it may not make much of a difference, but consider the case where there's a function which is occasionally called in the inner loop, depending on some condition).

Another issue here is that generate_func could change pitch, and so i*pitch needs to done each time, rather than only when i changes.

It can be recoded as:

void
somefunc()
{
    int h0,w0;
    int h,w;
    int i,j;
    extern int pitch;
    int apit = pitch;

    get_hw( &h0,&w0 );  // get shape of array
    h= h0;
    w= w0;

    for( int i = 0; i < h; i++ ){
        for( int j = 0; j < w; j++ ){
            Arr[i*apit + j] = generate_func(i,j);
        }
    }
}

Now the variables apit,h,w are all 'safe' locals in the sense I defined above, and the compiler can be sure they won't be changed by any function calls. Assuming I'm not modifying anything in generate_func, the code will have the same effect as before but could be more efficient.

Jens Gustedt has suggested the use of the 'register' keyword as a way of tagging key variables to inhibit the use of & on them, e.g. by others maintaining the code (It won't affect the generated code, since the compiler can determine the lack of & without it). For my part, I always think carefully before applying & to any local scalar in a time-critical area of the code, and in my view using 'register' to enforce this is a little cryptic, but I can see the point (unfortunately it doesn't work in C++ since the compiler will just ignore the 'register').

Incidentally, in terms of code efficiency, the best way to have a function return two values is with a struct:

struct hw {  // this is what get_hw returns
   int h,w;
};

void
somefunc()
{
    int h,w;
    int i,j;

    struct hw hwval = get_hw();  // get shape of array
    h = hwval.h;
    w = hwval.w;
    ...

This may look cumbersome (and is cumbersome to write), but it will generate cleaner code than the previous examples. The 'struct hw' will actually be returned in two registers (on most modern ABIs anyway). And due to the way the 'hwval' struct is used, the optimizer will effectively break it up into two 'locals' hwval.h and hwval.w, and then determine that these are equivalent to h and w -- so hwval will essentially disappear in the code. No pointers need to be passed, no function is modifying another function's variables via pointer; it's just like having two distinct scalar return values. This is much easier to do now in C++11 - with std::tie and std::tuple, you can use this method with less verbosity (and without having to write a struct definition).

回复收藏 0 原文