取消引用类型双关指针将违反严格别名规则

发布于 2024-09-09 06:01:56 字数 1451 浏览 3 评论 0原文

我使用以下代码从文件中读取数据,作为较大程序的一部分。

double data_read(FILE *stream,int code) {
        char data[8];
        switch(code) {
        case 0x08:
            return (unsigned char)fgetc(stream);
        case 0x09:
            return (signed char)fgetc(stream);
        case 0x0b:
            data[1] = fgetc(stream);
            data[0] = fgetc(stream);
            return *(short*)data;
        case 0x0c:
            for(int i=3;i>=0;i--)
                data[i] = fgetc(stream);
            return *(int*)data;
        case 0x0d:
            for(int i=3;i>=0;i--)
                data[i] = fgetc(stream);
            return *(float*)data;
        case 0x0e:
            for(int i=7;i>=0;i--)
                data[i] = fgetc(stream);
            return *(double*)data;
        }
        die("data read failed");
        return 1;
    }

现在我被告知使用 -O2 并且收到以下 gcc 警告: 警告:取消引用类型双关指针将违反严格别名规则

Googleing我发现了两个正交答案:

vs

最后我不想忽略这些警告。你会推荐什么?

[更新] 我用真实函数替换了玩具示例。

I used the following piece of code to read data from files as part of a larger program.

double data_read(FILE *stream,int code) {
        char data[8];
        switch(code) {
        case 0x08:
            return (unsigned char)fgetc(stream);
        case 0x09:
            return (signed char)fgetc(stream);
        case 0x0b:
            data[1] = fgetc(stream);
            data[0] = fgetc(stream);
            return *(short*)data;
        case 0x0c:
            for(int i=3;i>=0;i--)
                data[i] = fgetc(stream);
            return *(int*)data;
        case 0x0d:
            for(int i=3;i>=0;i--)
                data[i] = fgetc(stream);
            return *(float*)data;
        case 0x0e:
            for(int i=7;i>=0;i--)
                data[i] = fgetc(stream);
            return *(double*)data;
        }
        die("data read failed");
        return 1;
    }

Now I am told to use -O2 and I get following gcc warning:
warning: dereferencing type-punned pointer will break strict-aliasing rules

Googleing I found two orthogonal answers:

vs

In the end I don't want to ignore the warnings. What would you recommend?

[update] I substituted the toy example with the real function.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(7

本王不退位尔等都是臣 2024-09-16 06:01:56

出现此问题的原因是您通过 double* 访问 char 数组:

char data[8];
...
return *(double*)data;

但 gcc 假定您的程序永远不会通过不同类型的指针访问变量。这种假设称为严格别名,允许编译器进行一些优化:

如果编译器知道您的 *(double*) 绝不能与 data[] 重叠,它允许各种各样的事情,比如将代码重新排序为:

return *(double*)data;
for(int i=7;i>=0;i--)
    data[i] = fgetc(stream);

循环很可能被优化掉,最终得到的只是:

return *(double*)data;

这会使你的 data[] 未初始化。在这种特殊情况下,编译器可能能够看到您的指针重叠,但如果您声明了它 char* data,它可能会产生错误。

但是,严格别名规则规定 char* 和 void* 可以指向任何类型。因此,您可以将其重写为:

double data;
...
*(((char*)&data) + i) = fgetc(stream);
...
return data;

严格的别名警告对于理解或修复非常重要。它们会导致无法在内部重现的错误,因为它们仅发生在一台特定机器上的一个特定操作系统上的一个特定编译器上,并且仅在满月和每年一次等情况下发生。

The problem occurs because you access a char-array through a double*:

char data[8];
...
return *(double*)data;

But gcc assumes that your program will never access variables though pointers of different type. This assumption is called strict-aliasing and allows the compiler to make some optimizations:

If the compiler knows that your *(double*) can in no way overlap with data[], it's allowed to all sorts of things like reordering your code into:

return *(double*)data;
for(int i=7;i>=0;i--)
    data[i] = fgetc(stream);

The loop is most likely optimized away and you end up with just:

return *(double*)data;

Which leaves your data[] uninitialized. In this particular case the compiler might be able to see that your pointers overlap, but if you had declared it char* data, it could have given bugs.

But, the strict-aliasing rule says that a char* and void* can point at any type. So you can rewrite it into:

double data;
...
*(((char*)&data) + i) = fgetc(stream);
...
return data;

Strict aliasing warnings are really important to understand or fix. They cause the kinds of bugs that are impossible to reproduce in-house because they occur only on one particular compiler on one particular operating system on one particular machine and only on full-moon and once a year, etc.

请远离我 2024-09-16 06:01:56

看起来很像你真的想使用 fread:

int data;
fread(&data, sizeof(data), 1, stream);

也就是说,如果你确实想走读取字符的路线,然后将它们重新解释为 int,这是在 C 中执行此操作的安全方法(但不是< /strong> 在 C++ 中)是使用联合:

union
{
    char theChars[4];
    int theInt;
} myunion;

for(int i=0; i<4; i++)
    myunion.theChars[i] = fgetc(stream);
return myunion.theInt;

我不确定为什么原始代码中 data 的长度是 3。我假设你想要 4 个字节;至少我不知道有哪个系统的 int 是 3 个字节。

请注意,您的代码和我的代码都是高度不可移植的。

编辑:如果您想从文件中可移植地读取各种长度的整数,请尝试如下操作:(

unsigned result=0;
for(int i=0; i<4; i++)
    result = (result << 8) | fgetc(stream);

注意:在实际程序中,您还需要根据 EOF 测试 fgetc() 的返回值。)

这将读取以小尾数格式从文件中获取 4 字节无符号,,无论系统的尾数是什么。它应该适用于任何无符号至少为 4 个字节的系统。

如果您想实现字节序中立,请不要使用指针或联合;使用位移来代替。

It looks a lot as if you really want to use fread:

int data;
fread(&data, sizeof(data), 1, stream);

That said, if you do want to go the route of reading chars, then reinterpreting them as an int, the safe way to do it in C (but not in C++) is to use a union:

union
{
    char theChars[4];
    int theInt;
} myunion;

for(int i=0; i<4; i++)
    myunion.theChars[i] = fgetc(stream);
return myunion.theInt;

I'm not sure why the length of data in your original code is 3. I assume you wanted 4 bytes; at least I don't know of any systems where an int is 3 bytes.

Note that both your code and mine are highly non-portable.

Edit: If you want to read ints of various lengths from a file, portably, try something like this:

unsigned result=0;
for(int i=0; i<4; i++)
    result = (result << 8) | fgetc(stream);

(Note: In a real program, you would additionally want to test the return value of fgetc() against EOF.)

This reads a 4-byte unsigned from the file in little-endian format, regardless of what the endianness of the system is. It should work on just about any system where an unsigned is at least 4 bytes.

If you want to be endian-neutral, don't use pointers or unions; use bit-shifts instead.

你曾走过我的故事 2024-09-16 06:01:56

在这里使用联合不是正确的做法。从联合体的未写入成员中读取是未定义的 - 即编译器可以自由地执行会破坏代码的优化(例如优化写入)。

Using a union is not the correct thing to do here. Reading from an unwritten member of the union is undefined - i.e. the compiler is free to perform optimisations that will break your code (like optimising away the write).

转瞬即逝 2024-09-16 06:01:56

本文档总结了情况:http://dbp-consulting.com/tutorials/StrictAliasing.html

有几种不同的解决方案,但最便携/最安全的解决方案是使用 memcpy()。 (函数调用可能会被优化掉,因此它并不像看起来那么低效。)例如,将其替换

return *(short*)data;

为:

short temp;
memcpy(&temp, data, sizeof(temp));
return temp;

This doc summarizes the situation: http://dbp-consulting.com/tutorials/StrictAliasing.html

There are several different solutions there, but the most portable/safe one is to use memcpy(). (The function calls may be optimized out, so it's not as inefficient as it appears.) For example, replace this:

return *(short*)data;

With this:

short temp;
memcpy(&temp, data, sizeof(temp));
return temp;
已下线请稍等 2024-09-16 06:01:56

基本上,您可以将 gcc 的消息视为您在找麻烦,别说我没有警告过您

将三字节字符数组转换为 int 是我见过的最糟糕的事情之一。通常您的 int 至少有 4 个字节。因此,对于第四个(如果 int 更宽,可能会更多),您将获得随机数据。然后将所有这些转换为 double

只是什么都不做。与您正在做的事情相比,gcc 警告的别名问题是无辜的。

Basically you can read gcc's message as guy you are looking for trouble, don't say I didn't warn ya.

Casting a three byte character array to an int is one of the worst things I have seen, ever. Normally your int has at least 4 bytes. So for the fourth (and maybe more if int is wider) you get random data. And then you cast all of this to a double.

Just do none of that. The aliasing problem that gcc warns about is innocent compared to what you are doing.

叹倦 2024-09-16 06:01:56

C 标准的作者希望让编译器编写者在理论上可能但不太可能使用看似不相关的指针访问全局变量的值的情况下生成有效的代码。这个想法并不是通过在单个表达式中强制转换和取消引用指针来禁止类型双关,而是说,给定类似的情况:

int x;
int foo(double *d)
{
  x++;
  *d=1234;
  return x;
}

编译器有权假设对 *d 的写入不会影响 x。标准的作者想要列出这样的情况:像上面这样从未知源接收指针的函数必须假设它可能为看似不相关的全局别名,而不要求类型完全匹配。不幸的是,虽然其基本原理强烈表明该标准的作者旨在描述一个最低限度一致性的标准,以防编译器没有理由相信事物可能存在别名,但该规则并未要求编译器会识别别名在明显的情况,并且 gcc 的作者决定,他们宁愿生成尽可能小的程序,同时符合编写糟糕的语言标准,而不是生成如下代码:实际上是有用的,而不是在明显的情况下识别别名(同时仍然能够假设看起来不会别名的东西不会),他们宁愿要求程序员使用 memcpy ,因此要求编译器允许未知来源的指针可能对任何内容进行别名的可能性,从而阻碍优化。

The authors of the C Standard wanted to let compiler writers generate efficient code in circumstances where it would be theoretically possible but unlikely that a global variable might have its value accessed using a seemingly-unrelated pointer. The idea wasn't to forbid type punning by casting and dereferencing a pointer in a single expression, but rather to say that given something like:

int x;
int foo(double *d)
{
  x++;
  *d=1234;
  return x;
}

a compiler would be entitled to assume that the write to *d won't affect x. The authors of the Standard wanted to list situations where a function like the above that received a pointer from an unknown source would have to assume that it might alias a seemingly-unrelated global, without requiring that types perfectly match. Unfortunately, while the rationale strongly suggests that authors of the Standard intended to describe a standard for minimum conformance in cases where a compiler would otherwise have no reason to believe that things might alias, the rule fails to require that compilers recognize aliasing in cases where it is obvious and the authors of gcc have decided that they'd rather generate the smallest program it can while conforming to the poorly-written language of the Standard, than generate code which is actually useful, and instead of recognizing aliasing in cases where it's obvious (while still being able to assume that things that don't look like they'll alias, won't) they'd rather require that programmers use memcpy, thus requiring a compiler to allow for the possibility that pointers of unknown origin might alias just about anything, thus impeding optimization.

山田美奈子 2024-09-16 06:01:56

显然,标准允许 sizeof(char*) 与 sizeof(int*) 不同,因此当您尝试直接转换时 gcc 会抱怨。 void* 有点特别,因为所有内容都可以在 void* 之间来回转换。
在实践中,我不知道有多少架构/编译器的指针对于所有类型并不总是相同,但 gcc 正确地发出警告,即使它很烦人。

我认为安全的方法是

int i, *p = &i;
char *q = (char*)&p[0];

或者

char *q = (char*)(void*)p;

你也可以尝试这个,看看你会得到什么:

char *q = reinterpret_cast<char*>(p);

Apparently the standard allows sizeof(char*) to be different from sizeof(int*) so gcc complains when you try a direct cast. void* is a little special in that everything can be converted back and forth to and from void*.
In practice I don't know many architecture/compiler where a pointer is not always the same for all types but gcc is right to emit a warning even if it is annoying.

I think the safe way would be

int i, *p = &i;
char *q = (char*)&p[0];

or

char *q = (char*)(void*)p;

You can also try this and see what you get:

char *q = reinterpret_cast<char*>(p);
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文