C 标准寻址简化不一致

发布于 2024-10-16 08:46:57 字数 1054 浏览 9 评论 0 原文

第 §6.5.3.2 节“地址和间接运算符”¶3 说(仅相关部分):

一元 &运算符返回其操作数的地址。 ... 如果操作数是一元 * 运算符的结果,则该运算符和 & 运算符都不会被计算,并且结果就像两者都被省略一样,除了约束之外运算符仍然适用,并且结果不是左值。同样,如果操作数是 [] 运算符的结果,则 & 运算符和 < 隐含的一元 * 都不是code>[] 被评估,结果就像删除了 & 运算符并将 [] 运算符更改为 +< /代码> 运算符。 ...

这意味着:

#define NUM 10
int tmp[NUM];
int *i = tmp;
printf("%ti\n", (ptrdiff_t) (&*i - i) );
printf("%ti\n", (ptrdiff_t) (&i[NUM] - i) );

应该完全合法,打印 0 和 NUM (10)。该标准似乎非常明确,这两种情况都需要优化。

然而,它似乎不需要优化以下内容:

struct { int a; short b; } tmp, *s = tmp;
printf("%ti\n", (ptrdiff_t) (&s->b - s) );

这看起来非常不一致。我看不出上面的代码不应该打印 sizeof(int) 加上(不太可能)填充(可能是 4)。

简化 &-> 表达式在概念上(恕我直言)与 &[] 相同,都是一个简单的地址加偏移量。它甚至是一个可以在编译时确定的偏移量,而不是在运行时使用 [] 运算符确定。

为什么这看起来如此不一致,有什么基本原理吗?

Section §6.5.3.2 "Address and indirection operators" ¶3 says (relevant section only):

The unary & operator returns the address of its operand. ...
If the operand is the result of a unary * operator, neither that operator nor the & operator is evaluated and the result is as if both were omitted, except that the constraints on the operators still apply and the result is not an lvalue. Similarly, if the operand is the result of a [] operator, neither the & operator nor the unary * that is implied by the [] is evaluated and the result is as if the & operator were removed and the [] operator were changed to a + operator. ...

This means that this:

#define NUM 10
int tmp[NUM];
int *i = tmp;
printf("%ti\n", (ptrdiff_t) (&*i - i) );
printf("%ti\n", (ptrdiff_t) (&i[NUM] - i) );

Should be perfectly legal, printing 0 and the NUM (10). The standard seems very clear that both of those cases are required to be optimized.

However, it doesn't seem to require the following to be optimized:

struct { int a; short b; } tmp, *s = tmp;
printf("%ti\n", (ptrdiff_t) (&s->b - s) );

This seems awfully inconsistent. I can see no reason that the above code shouldn't print the sizeof(int) plus (unlikely) padding (possibly 4).

Simplifying a &-> expression is going to be the same conceptually (IMHO) as &[], a simple address-plus-offset. It's even an offset that's going to be determinable at compile time, rather than potentially runtime with the [] operator.

Is there anything in the rationale about why this is so seemingly inconsistent?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

多情出卖 2024-10-23 08:46:57

在你的例子中, &i[10] 实际上是不合法的:它变成了 i + 10,又变成了 NULL + 10,你无法对空指针执行算术运算。 (6.5.6/8列出了可以执行指针运算的条件)

无论如何,这个规则是在C99中添加的;它在 C89 中不存在。我的理解是,它的添加很大程度上是为了使代码像下面这样定义良好:

int* begin, * end;
int v[10];

begin = &v[0];
end = &v[10];

最后一行在技术上在 C89(和 C++)中是无效的,但由于这条规则而在 C99 中是允许的。这是一个相对较小的变化,使常用的结构有了明确的定义。

因为您无法对空指针执行算术运算,所以您的示例 (&s->b) 无论如何都是无效的。

至于为什么会出现这种“不一致”,我只能猜测。很可能没有人想到要使其保持一致,或者没有人看到一个令人信服的用例。这可能是经过考虑并最终被拒绝的。 基本原理。您也许可以在WG14 论文,但不幸的是,它们的组织似乎相当糟糕,因此搜索它们可能会很乏味。

In your example, &i[10] is actually not legal: it becomes i + 10, which becomes NULL + 10, and you can't perform arithmetic on a null pointer. (6.5.6/8 lists the conditions under which pointer arithmetic can be performed)

Anyway, this rule was added in C99; it was not present in C89. My understanding is that it was added in large part to make code like the following well-defined:

int* begin, * end;
int v[10];

begin = &v[0];
end = &v[10];

That last line is technically invalid in C89 (and in C++) but is allowed in C99 because of this rule. It was a relatively minor change that made a commonly used construct well-defined.

Because you can't perform arithmetic on a null pointer, your example (&s->b) would be invalid anyway.

As for why there is this "inconsistency," I can only guess. It's likely that no one thought to make it consistent or no one saw a compelling use case for this. It's possible that this was considered and ultimately rejected. There are no remarks about the &* reduction in the Rationale. You might be able to find some definitive information in the WG14 papers, but unfortunately they seem to be quite poorly organized, so trawling through them may be tedious.

逐鹿 2024-10-23 08:46:57

我认为该规则并未出于优化目的而添加(它会带来什么,而 as-if 规则不会带来什么?),而是允许 &t[sizeof(t)/sizeof(*t)] &*(t+sizeof(t)/sizeof(*t)) 如果没有它,这将是未定义的行为(直接写这样的东西可能看起来很愚蠢,但是添加一个层或两个宏,它是有意义的)。我不认为特殊的外壳 &p->m 会带来这样的好处。请注意,正如 James 指出的那样,带有 pa 空指针的 &p[10] 仍然是未定义的行为;带有 pa 空指针的 &p->m 同样会保持无效(我必须承认,当 p 是空指针时,我没有看到任何用处)。

I think that the rule hasn't been added for optimization purpose (what does it bring that the as-if rule doesn't?) but to allow &t[sizeof(t)/sizeof(*t)] and &*(t+sizeof(t)/sizeof(*t)) which would be undefined behaviour without it (writing such things directly may seem silly, but add a layer or two of macros and it can make sense). I don't see a case where special casing &p->m would bring such benefit. Note that as James pointed out, &p[10] with p a null pointer is still undefined behaviour; &p->m with p a null pointer would similarly have stayed invalid (and I must admit that I don't see any use when p is the null pointer).

水染的天色ゝ 2024-10-23 08:46:57

我相信编译器可以选择以不同的方式打包,可能在结构成员之间添加填充以提高内存访问速度。这意味着您不能确定 b始终偏移 4。单值则不存在同样的问题。

此外,编译器在优化阶段可能不知道内存中结构的布局,从而阻止有关结构成员访问和后续指针转换的任何类型的优化。


编辑:

我有另一个理论......

很多时候编译器会在词法分析和解析之后优化抽象语法树。这意味着它将找到诸如取消运算符和计算结果为常量的表达式之类的内容,并将树的这些部分减少为一个节点。这也意味着有关结构的信息不可用。在某些代码生成之后发生的后续优化过程可能会考虑到这一点,因为它们具有附加信息,但对于诸如修剪 AST 之类的事情,该信息尚不存在。

I believe that the compiler can choose to pack in different ways, possibly adding padding between members of a struct to increase memory access speed. This means that you can't for sure say that b will always be an offset of 4 away. The single value does not have the same problem.

Also, the compiler may not know the layout of a struct in memory during the optimization phase, thus preventing any sort of optimization concerning struct member accesses and subsequent pointer casts.


edit:

I have another theory...

many times the compiler will optimize the abstract syntax tree just after lexical analysis and parsing. This means it will find things like operators that cancel out and expressions that evaluate to a constant and reduce those sections of the tree to one node. This also means that the information about structs is not available. later optimization passes that occur after some code generation may be able to take this into account because they have additional information, but for things like trimming the AST, that information is not yet there.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文