如何在嵌入式系统中安全地执行类型双关

发布于 2024-12-29 04:12:29 字数 1667 浏览 3 评论 0原文

我们的团队目前正在使用一些从旧架构移植到基于 ARM Cortex M3 平台的新产品(使用 GCC 4.5.1 定制版本)的代码。我们正在从通信链路读取数据,并尝试将原始字节数组转换为结构以干净地解析数据。将指针转换为结构体并取消引用后,我们收到一条警告:“取消引用类型双关指针将违反严格别名规则”。

经过一些研究,我意识到,由于 char 数组没有对齐规则,并且结构必须进行字对齐,因此强制转换指针会导致未定义的行为(一件坏事)。我想知道是否有更好的方法来完成我们正在尝试的事情。

我知道我们可以使用 GCC 的“attribute ((aligned (4)))”显式地对 char 数组进行字对齐。我相信这将使我们的代码“更安全”,但警告仍然会扰乱我们的构建,并且我不想禁用警告,以防这种情况再次出现。我们想要的是一种安全地做我们正在尝试的事情的方法,如果我们稍后尝试在另一个地方做一些不安全的事情,它仍然会通知我们。由于这是一个嵌入式系统,RAM 使用和闪存使用在某种程度上很重要。

可移植性(编译器和架构)并不是一个大问题,这只是针对一个产品。但是,如果存在便携式解决方案,那么它将是首选。

这是我们当前正在做的一个(非常简单的)示例:

#define MESSAGE_TYPE_A 0
#define MESSAGE_TYPE_B 1

typedef struct MessageA __attribute__((__packed__))
{
    unsigned char  messageType;
    unsigned short data1;
    unsigned int   data2;
}

typedef struct MessageB __attribute__((__packed__))
{
    unsigned char  messageType;
    unsigned char  data3;
    unsigned char  data4;
}


// This gets filled by the comm system, assume from a UART interrupt or similar
unsigned char data[100];


// Assume this gets called once we receive a full message
void ProcessMessage()
{
    MessageA* messageA;
    unsigned char messageType = data[0];

    if (messageType == MESSAGE_TYPE_A)
    {
        // Cast data to struct and attempt to read
        messageA = (MessageA*)data; // Not safe since data may not be word aligned
                                    // This may cause undefined behavior

        if (messageA->data1 == 4) // warning would be here, when we use the data at the pointer
        {
            // Perform some action...
        }
    }
    // ...
    // process different types of messages
}

Our team is currently using some ported code from an old architecture to a new product based on the ARM Cortex M3 platform using a customized version of GCC 4.5.1. We are reading data from a communications link, and attempting to cast the raw byte array to a struct to cleanly parse the data. After casting the pointer to a struct and dereferencing, we are getting a warning: "dereferencing type-punned pointer will break strict-aliasing rules".

After some research, I've realized that since the char array has no alignment rules and the struct have to be word aligned, casting the pointers causes undefined behavior (a Bad Thing). I'm wondering if there is a better way to do what we're trying.

I know we can explicitly word-align the char array using GCC's "attribute ((aligned (4)))". I believe this will make our code "safer", but the warnings will still clutter up our builds, and I don't want to disable the warnings in case this situation arises again. What we want is a way to safely do what we are trying, that will still inform us if we attempt to do something unsafe in another place later. Since this is an embedded system, RAM usage and flash usage are important to some degree.

Portability (compiler and architecture) is not a huge concern, this is just for one product. However, if a portable solution exists, it would be preferred.

Here is the a (very simplified) example of what we are currently doing:

#define MESSAGE_TYPE_A 0
#define MESSAGE_TYPE_B 1

typedef struct MessageA __attribute__((__packed__))
{
    unsigned char  messageType;
    unsigned short data1;
    unsigned int   data2;
}

typedef struct MessageB __attribute__((__packed__))
{
    unsigned char  messageType;
    unsigned char  data3;
    unsigned char  data4;
}


// This gets filled by the comm system, assume from a UART interrupt or similar
unsigned char data[100];


// Assume this gets called once we receive a full message
void ProcessMessage()
{
    MessageA* messageA;
    unsigned char messageType = data[0];

    if (messageType == MESSAGE_TYPE_A)
    {
        // Cast data to struct and attempt to read
        messageA = (MessageA*)data; // Not safe since data may not be word aligned
                                    // This may cause undefined behavior

        if (messageA->data1 == 4) // warning would be here, when we use the data at the pointer
        {
            // Perform some action...
        }
    }
    // ...
    // process different types of messages
}

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

夏九 2025-01-05 04:12:29

正如已经指出的那样,四处投掷指针是一种狡猾的做法。

解决方案:使用联合

struct message {
  unsigned char messageType;
  union {
    struct {
      int data1;
      short data2;
    } A;
    struct {
      char data1[5];
      int data2;
    } B;
  } data;
};

void func (...) {
  struct message msg;
  getMessage (&msg);

  switch (msg.messageType) {
    case TYPEA:
      doStuff (msg.data.A.data1);
      break;
    case TYPEB:
      doOtherStuff (msg.data.B.data1);
      break;
  }
}

通过这种方式,编译器知道您正在通过不同的方式访问相同的数据,并且警告和坏事将会消失。

当然,您需要确保结构对齐和打包与您的消息格式相匹配。如果链接另一端的机器不匹配,请注意字节序问题等。

As has already been pointed out, casting pointers about is a dodgy practice.

Solution: use a union

struct message {
  unsigned char messageType;
  union {
    struct {
      int data1;
      short data2;
    } A;
    struct {
      char data1[5];
      int data2;
    } B;
  } data;
};

void func (...) {
  struct message msg;
  getMessage (&msg);

  switch (msg.messageType) {
    case TYPEA:
      doStuff (msg.data.A.data1);
      break;
    case TYPEB:
      doOtherStuff (msg.data.B.data1);
      break;
  }
}

By this means the compiler knows you're accessing the same data via different means, and the warnings and Bad Things will go away.

Of coure, you'll need to make sure the structure alignment and packing matches your message format. Beware endian issues and such if the machine on the other end of the link doesn't match.

素罗衫 2025-01-05 04:12:29

通过与 char * 不同的类型或指向 char 有符号/无符号变体的指针进行类型双关并不严格符合要求,因为它违反了 C 别名规则(有时还违反了对齐规则)如果不给予照顾)。

但是,gcc 允许通过联合类型进行类型双关。 gcc 的联机帮助页明确记录了它:

从与最近写入的成员不同的联盟成员那里读取内容的做法(称为“类型双关”)很常见。即使与
-fstrict-aliasing,允许类型双关,前提是通过联合类型访问内存。

要使用 gcc 禁用与别名规则相关的优化(从而允许程序打破 C 别名规则),可以使用以下命令编译程序:-fno-strict-aliasing。请注意,启用此选项后,程序不再严格符合要求,但您说可移植性不是问题。有关信息,Linux 内核是使用此选项编译的。

Type punning through cast of types different than char * or a pointer to a signed/unsigned variant of char is not strictly conforming as it violates C aliasing rules (and sometimes alignment rules if no care is given).

However, gcc permits type punning through union types. Manpage of gcc explicitly documents it:

The practice of reading from a different union member than the one most recently written to (called "type-punning") is common. Even with
-fstrict-aliasing, type-punning is allowed, provided the memory is accessed through the union type.

To disable optimizations related to aliasing rules with gcc (and thus allow the program to break C aliasing rules), the program can be compiled with: -fno-strict-aliasing. Note that with this option enabled, the program is no longer strictly conforming, but you said portability is not a concern. For information, the Linux kernel is compiled with this option.

怪我闹别瞎闹 2025-01-05 04:12:29

GCC 有一个 -fno-strict-aliasing 标志,它将禁用基于严格别名的优化并确保您的代码安全。

如果您确实正在寻找一种“修复”它的方法,则必须重新考虑代码的工作方式。您不能只是按照您尝试的方式覆盖结构,因此您需要执行以下操作:

MessageA messageA;
messageA.messageType = data[0];
// Watch out - endianness and `sizeof(short)` dependent!
messageA.data1 = (data[1] << 8) + data[2];
// Watch out - endianness and `sizeof(int)` dependent!
messageA.data2 = (data[3] << 24) + (data[4] << 16)
               + (data[5] <<  8) + data[6];

此方法将使您避免打包结构,这也可能会提高代码中其他位置的性能特征。或者:

MessageA messageA;
memcpy(&messageA, data, sizeof messageA);

将使用您的打包结构来完成此操作。如果需要,您可以执行相反的操作将结构转换回平面缓冲区。

GCC has a -fno-strict-aliasing flag that will disable strict-aliasing-based optimizations and make your code safe.

If you're really looking for a way to "fix" it, you have to rethink the way your code works. You can't just overlay the structure the way you're trying, so you need to do something like this:

MessageA messageA;
messageA.messageType = data[0];
// Watch out - endianness and `sizeof(short)` dependent!
messageA.data1 = (data[1] << 8) + data[2];
// Watch out - endianness and `sizeof(int)` dependent!
messageA.data2 = (data[3] << 24) + (data[4] << 16)
               + (data[5] <<  8) + data[6];

This method will let you avoid packing your structure, which might also improve its performance characteristics elsewhere in your code. Alternately:

MessageA messageA;
memcpy(&messageA, data, sizeof messageA);

Will do it with your packed structures. You would do the reverse operations to translate the structures back into a flat buffer if necessary.

抠脚大汉 2025-01-05 04:12:29

停止使用打包结构,并将各个字段 memcpy 转换为正确大小和类型的变量。这是实现您想要实现的目标的安全、便携、干净的方式。如果幸运的话,gcc 会将微小的固定大小的 memcpy 优化为一些简单的加载和存储指令。

Stop using packed structures and memcpy the individual fields into variables of the correct size and type. This is the safe, portable, clean way to do what you're trying to achieve. If you're lucky, gcc will optimize the tiny fixed-size memcpy into a few simple load and store instructions.

樱花坊 2025-01-05 04:12:29

Cortex M3 可以很好地处理未对齐的访问。我已经在 M3 的类似数据包处理系统中完成了此操作。您不需要执行任何操作,只需使用标志 -fno-strict-aliasing 即可消除警告。

The Cortex M3 can handle unaligned accesses just fine. I have done this in similar packet processing systems with the M3. You don't need to do anything, you can just use the flag -fno-strict-aliasing to get rid of the warning.

樱&纷飞 2025-01-05 04:12:29

对于未对齐的访问,请查看 Linux 宏 get_unaligned/put_unaligned。

For unaligned accesses, look at the linux macros get_unaligned/put_unaligned.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文