C问题：为什么char实际上占用了4个字节的内存？

发布于 2024-10-19 16:29:02 字数 543 浏览 2 评论 0原文

我写了一个小程序来检查 char 在内存中占用了多少字节，它显示 char 实际上在内存中占用了 4 个字节。我知道这主要是因为字对齐，并且没有看到 char 只有 1 个字节的优势。为什么不使用 4 个字节作为 char？

int main(void)
{
  int a;
  char b;
  int c;
  a = 0;
  b = 'b';
  c = 1;
  printf("%p\n",&a);
  printf("%p\n",&b);
  printf("%p\n",&c);
  return 0;
}

输出： 0x7fff91a15c58 0x7fff91a15c5f 0x7fff91a15c54

更新：我不相信 malloc 只会为 char 分配 1 个字节，即使 sizeof(char) 作为参数传递，因为 malloc 包含一个标头将确保标头是字对齐的。有什么意见吗？

更新2：如果要求您在没有填充的情况下有效使用内存，唯一的方法是创建一个特殊的内存分配器吗？或者是否可以禁用填充？

原文

I wrote a small program to check how many bytes char occupies in my memory and it shows char actually occupies 4 bytes in memory. I understand it's mostly because of word alignment and don't see advantage of a char being only 1 byte. Why not use 4 bytes for char?

int main(void)
{
  int a;
  char b;
  int c;
  a = 0;
  b = 'b';
  c = 1;
  printf("%p\n",&a);
  printf("%p\n",&b);
  printf("%p\n",&c);
  return 0;
}

Output:
0x7fff91a15c58
0x7fff91a15c5f
0x7fff91a15c54

Update:
I don't believe that malloc will allocate only 1 byte for char, even though sizeof(char) is passed as argument because, malloc contains a header will makes sure that header is word aligned. Any comments?

Update2:
If you are asked to effectively use memory without padding, is the only way is to create a special memory allocator? or is it possible to disable padding?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

生生漫 2024-10-26 16:29:02

您有 int、char、int

请参阅“为什么限制字节对齐？”下的图片。
http://www.eventhelix.com/realtimemantra/ByteAlignmentAndOrdering.htm

          Byte 0 Byte 1 Byte 2  Byte 3
0x1000               
0x1004  X0     X1     X2      X3
0x1008               
0x100C         Y0     Y1      Y2

如果已存储它们以 4 字节、1 字节和 4 字节形式存在，需要 2 个 cpu 周期来检索 int c 并进行一些位移以获得正确对齐的 c 的实际值以用作一个整数。

You have int, char, int

See the image here under "Why Restrict Byte Alignment?"
http://www.eventhelix.com/realtimemantra/ByteAlignmentAndOrdering.htm

          Byte 0 Byte 1 Byte 2  Byte 3
0x1000               
0x1004  X0     X1     X2      X3
0x1008               
0x100C         Y0     Y1      Y2

If it had stored them in 4-byte, 1-byte and 4-byte form, it would have taken 2 cpu cycles to retrieve int c and some bit-shifting to get the actual value of c aligned properly for use as an int.

回复收藏 0 原文

岛徒 2024-10-26 16:29:02

对齐

让我们看看打印 a、b 和 c 地址的输出：

输出：0x7fff91a15c58 0x7fff91a15c5f 0x7fff91a15c54

请注意，b 不在同一个 4 字节边界上？ a 和 c 彼此相邻？这是它在内存中的样子，每行占用 4 个字节，最右边的列是第 0 个位置：

| b | x | x | x | 0x5c5c
-----------------
| a | a | a | a | 0x5c58 
-----------------
| c | c | c | c | 0x5c54

这是编译器优化空间并保持字对齐的方式。尽管 b 的地址是 0x5c5f，但它实际上并不占用 4 个字节。如果您采用相同的代码并添加一个短 d，您将看到以下内容：

| b | x | d | d | 0x5c5c
-----------------
| a | a | a | a | 0x5c58 
-----------------
| c | c | c | c | 0x5c54

其中 d 的地址是 0x5c5c。 Shorts 将与两个字节对齐，因此在 c 和 d 之间仍然有一个字节的未使用内存。添加另一个字符，您将得到：

| b | e | d | d | 0x5c5c
-----------------
| a | a | a | a | 0x5c58 
-----------------
| c | c | c | c | 0x5c54

这是我的代码和输出。请注意，我的地址会略有不同，但无论如何，它是我们真正关心的地址中最低有效数字：

int main(void)
{
  int a;
  char b;
  int c;
  short d;
  char e;
  a = 0;
  b = 'b';
  c = 1;
  printf("%p\n",&a);
  printf("%p\n",&b);
  printf("%p\n",&c);
  printf("%p\n",&d);
  printf("%p\n",&e);
  return 0;
}

$ ./a.out 
0xbfa0bde8
0xbfa0bdef
0xbfa0bde4
0xbfa0bdec
0xbfa0bdee

Malloc

malloc 的手册页说它“分配 size 字节并返回指向分配的指针”记忆。”它还说它将“返回一个指向已分配内存的指针，该内存适合任何类型的变量对齐”。根据我的测试，重复调用 malloc(1) 会以“双字”增量返回地址，但我不会指望这一点。

注意事项

我的代码在 x86 32 位机器上运行。其他机器可能会略有不同，并且某些编译器可能会以不同的方式进行优化，但这些想法应该是正确的。

Alignment

Let's look at your output for printing the addresses of a, b, and c:

Output: 0x7fff91a15c58 0x7fff91a15c5f 0x7fff91a15c54

Notice that b isn't on the same 4 byte boundary? And that a and c are next to each other? Here is what it looks like in memory, with each row taking up 4 bytes, and the rightmost column being the 0th place:

| b | x | x | x | 0x5c5c
-----------------
| a | a | a | a | 0x5c58 
-----------------
| c | c | c | c | 0x5c54

This is the compilers way of optimizing space and keeping things word aligned. Even though your address of b is 0x5c5f, it isn't actually taking up 4 bytes. If you take your same code and add a short d, you'll see this:

| b | x | d | d | 0x5c5c
-----------------
| a | a | a | a | 0x5c58 
-----------------
| c | c | c | c | 0x5c54

Where the address of d is 0x5c5c. Shorts are going to be aligned to two bytes, so you will still have one byte of unused memory between c and d. Add in another char e, and you'll get:

| b | e | d | d | 0x5c5c
-----------------
| a | a | a | a | 0x5c58 
-----------------
| c | c | c | c | 0x5c54

Here's my code and the output. Please note that my addresses will differ slightly, but it's the least significant digit in the address that we're really concerned about anyway:

int main(void)
{
  int a;
  char b;
  int c;
  short d;
  char e;
  a = 0;
  b = 'b';
  c = 1;
  printf("%p\n",&a);
  printf("%p\n",&b);
  printf("%p\n",&c);
  printf("%p\n",&d);
  printf("%p\n",&e);
  return 0;
}

$ ./a.out 
0xbfa0bde8
0xbfa0bdef
0xbfa0bde4
0xbfa0bdec
0xbfa0bdee

Malloc

The man page of malloc says that it "allocates size bytes and returns a pointer to the allocated memory." It also says that it will "return a pointer to the allocated memory, which is suitably aligned for any kind of variable". From my testing, repeated calls to malloc(1) are returning addresses in "double word" increments, but I wouldn't count on this.

Caveats

My code was ran on an x86 32-bit machine. Other machines might vary slightly, and some compilers may optimize in different ways, but the ideas should hold true.

回复收藏 0 原文

远山浅 2024-10-26 16:29:02

变量本身不占用 4 个字节的内存，它占用 1 个字节，然后是 3 个字节的填充，因为堆栈上的下一个变量是 int，因此必须字对齐。

在下面的例子中，你会发现变量anotherChar的地址比b的地址大1个字节。然后在 int c 之前跟随 2 个字节的填充

int main(void)
{
  int a;
  char b;
  char anotherChar;
  int c;
  a = 0;
  b = 'b';
  c = 1;
  printf("%p\n",&a);
  printf("%p\n",&b);
  printf("%p\n",&anotherChar);
  printf("%p\n",&c);
  return 0;
}

The variable itself doesn't occupy 4 bytes of memory, it occupies 1 byte, and is then followed by 3 bytes of padding, since the next variable on the stack is an int, and therefore has to be word aligned.

In a case like the one below, you will find that the address of variable anotherChar is 1 byte larger than that of b. They are then followed by 2 bytes of padding before int c

int main(void)
{
  int a;
  char b;
  char anotherChar;
  int c;
  a = 0;
  b = 'b';
  c = 1;
  printf("%p\n",&a);
  printf("%p\n",&b);
  printf("%p\n",&anotherChar);
  printf("%p\n",&c);
  return 0;
}

回复收藏 0 原文