C 奇怪的数组行为

发布于 2024-09-01 10:29:15 字数 1608 浏览 15 评论 0原文

在得知 strncmp 并不像看上去的那样,并且 strlcpy 在我的操作系统 (Linux) 上不可用后,我想我可以尝试自己编写它。

我找到了 libc 维护者 Ulrich Drepper 的一句话,他发布了使用 mempcpy 替代 strlcpy 的方法。我也没有 mempcpy,但它的行为很容易复制。首先,这是我的测试用例

#include <stdio.h>
#include <string.h>

#define BSIZE 10

void insp(const char* s, int n)
{
   int i;

   for (i = 0; i < n; i++)
      printf("%c  ", s[i]);

   printf("\n");

   for (i = 0; i < n; i++)
      printf("%02X ", s[i]);

   printf("\n");

   return;
}

int copy_string(char *dest, const char *src, int n)
{
   int r = strlen(memcpy(dest, src, n-1));
   dest[r] = 0;

   return r;
}

int main()
{
   char b[BSIZE];
   memset(b, 0, BSIZE);

   printf("Buffer size is %d", BSIZE);

   insp(b, BSIZE);

   printf("\nFirst copy:\n");
   copy_string(b, "First", BSIZE);
   insp(b, BSIZE);
   printf("b = '%s'\n", b);

   printf("\nSecond copy:\n");
   copy_string(b, "Second", BSIZE);
   insp(b, BSIZE);

   printf("b = '%s'\n", b);

   return 0;
}

,这是它的结果:

Buffer size is 10                    
00 00 00 00 00 00 00 00 00 00 

First copy:
F  i  r  s  t     b     =    
46 69 72 73 74 00 62 20 3D 00 
b = 'First'

Second copy:
S  e  c  o  n  d          
53 65 63 6F 6E 64 00 00 01 00 
b = 'Second'

您可以在内部表示中看到(创建的行 insp() ),其中混合了一些噪声,例如 printf () 格式字符串在检查第一个副本后,在第二个副本中出现了一个外来的 0x01。

字符串被完整复制,并且它可以正确处理太长的源字符串(暂时让我们忽略将 0 作为长度传递给 copy_string 可能出现的问题,稍后我将修复该问题)。

但是为什么我的目的地中有外部数组内容(来自格式字符串)?就好像目的地实际上已调整大小以匹配新的长度。

After learning that both strncmp is not what it seems to be and strlcpy not being available on my operating system (Linux), I figured I could try and write it myself.

I found a quote from Ulrich Drepper, the libc maintainer, who posted an alternative to strlcpy using mempcpy. I don't have mempcpy either, but it's behaviour was easy to replicate. First of, this is the testcase I have

#include <stdio.h>
#include <string.h>

#define BSIZE 10

void insp(const char* s, int n)
{
   int i;

   for (i = 0; i < n; i++)
      printf("%c  ", s[i]);

   printf("\n");

   for (i = 0; i < n; i++)
      printf("%02X ", s[i]);

   printf("\n");

   return;
}

int copy_string(char *dest, const char *src, int n)
{
   int r = strlen(memcpy(dest, src, n-1));
   dest[r] = 0;

   return r;
}

int main()
{
   char b[BSIZE];
   memset(b, 0, BSIZE);

   printf("Buffer size is %d", BSIZE);

   insp(b, BSIZE);

   printf("\nFirst copy:\n");
   copy_string(b, "First", BSIZE);
   insp(b, BSIZE);
   printf("b = '%s'\n", b);

   printf("\nSecond copy:\n");
   copy_string(b, "Second", BSIZE);
   insp(b, BSIZE);

   printf("b = '%s'\n", b);

   return 0;
}

And this is its result:

Buffer size is 10                    
00 00 00 00 00 00 00 00 00 00 

First copy:
F  i  r  s  t     b     =    
46 69 72 73 74 00 62 20 3D 00 
b = 'First'

Second copy:
S  e  c  o  n  d          
53 65 63 6F 6E 64 00 00 01 00 
b = 'Second'

You can see in the internal representation (the lines insp() created) that there's some noise mixed in, like the printf() format string in the inspection after the first copy, and a foreign 0x01 in the second copy.

The strings are copied intact and it correctly handles too long source strings (let's ignore the possible issue with passing 0 as length to copy_string for now, I'll fix that later).

But why are there foreign array contents (from the format string) inside my destination? It's as if the destination was actually RESIZED to match the new length.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

过期以后 2024-09-08 10:29:15

字符串的末尾用 \0 标记,之后的内存可以是任何内容,除非您的操作系统故意将其清空,否则它只是留在那里的任何随机垃圾。

请注意,在这种情况下,“问题”不在 copy_string 中,您正好复制​​了 10 个字符 - 但主代码中“first”之后的内存只是随机的。

The end of the string is marked by a \0 the memory after that can be anything, unless your OS deliberately blanks it then it's just whatever random junk was left there.

Note in this case the 'problem' isn't in the copy_string , you are exactly copying 10chars - but the memory after 'first' in your main code is just random.

花心好男孩 2024-09-08 10:29:15

因为您没有停止在源大小处,而是停止在目标大小处,而该目标大小恰好比源大小大,因此您正在复制源字符串以及过去的一些垃圾。

您可以轻松地看到您正在复制源字符串及其空终止符。但由于您正在内存复制 10 个字节,并且字符串“First”和“Second”都短于 10 个字节,因此您还将复制过去的额外字节。

Because you are not stopping at the source size, you are stopping at the destiny size, which happens to be bigger than source, so you are copying the source string plus a bit of garbage past it.

You can easily see that you are copying your source string, with its null terminator. But since you are memcopying 10 bytes and both strings "First" and "Second" are shorter than 10 bytes, you are also copying the extra bytes past them.

倒带 2024-09-08 10:29:15

如果 destsrc 都不至少为 n,则使用 memcpy(dest, src, n-1) 会调用未定义的行为长度为-1

例如,First\0 的长度是 6 个字符,但您从中读取了 n-1 (9) 个字符;字符串文字末尾之后的内存内容是未定义的,就像您读取该内存时程序的行为一样。

The use of memcpy(dest, src, n-1) invokes undefined behavior if dest and src are not both at least n-1 in length.

For example, First\0 is six characters in length, but you read n-1 (9) characters from it; the contents of the memory past the end of the string literal are undefined, as is the behavior of your program when you read that memory.

烟火散人牵绊 2024-09-08 10:29:15

额外的“东西”之所以存在,是因为您已将缓冲区大小传递给了memcpy。即使源较短,它也会复制那么多字符。

我会做一些不同的事情:

void copy_string(char *dest, char const *src, size_t n) { 
    *dest = '\0';
    strncat(dest, src, n);
}

strncpy 不同,strncat 被定义为按照大多数人合理期望的方式工作。

The extra "stuff" is there because you've passed the buffer size to memcpy. It's going to copy that many characters, even when the source is shorter.

I'd do things a bit differently:

void copy_string(char *dest, char const *src, size_t n) { 
    *dest = '\0';
    strncat(dest, src, n);
}

Unlike strncpy, strncat is defined to work how most people would reasonably expect.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文