C 奇怪的数组行为
在得知 strncmp
并不像看上去的那样,并且 strlcpy
在我的操作系统 (Linux) 上不可用后,我想我可以尝试自己编写它。
我找到了 libc 维护者 Ulrich Drepper 的一句话,他发布了使用 mempcpy
替代 strlcpy
的方法。我也没有 mempcpy
,但它的行为很容易复制。首先,这是我的测试用例
#include <stdio.h>
#include <string.h>
#define BSIZE 10
void insp(const char* s, int n)
{
int i;
for (i = 0; i < n; i++)
printf("%c ", s[i]);
printf("\n");
for (i = 0; i < n; i++)
printf("%02X ", s[i]);
printf("\n");
return;
}
int copy_string(char *dest, const char *src, int n)
{
int r = strlen(memcpy(dest, src, n-1));
dest[r] = 0;
return r;
}
int main()
{
char b[BSIZE];
memset(b, 0, BSIZE);
printf("Buffer size is %d", BSIZE);
insp(b, BSIZE);
printf("\nFirst copy:\n");
copy_string(b, "First", BSIZE);
insp(b, BSIZE);
printf("b = '%s'\n", b);
printf("\nSecond copy:\n");
copy_string(b, "Second", BSIZE);
insp(b, BSIZE);
printf("b = '%s'\n", b);
return 0;
}
,这是它的结果:
Buffer size is 10
00 00 00 00 00 00 00 00 00 00
First copy:
F i r s t b =
46 69 72 73 74 00 62 20 3D 00
b = 'First'
Second copy:
S e c o n d
53 65 63 6F 6E 64 00 00 01 00
b = 'Second'
您可以在内部表示中看到(创建的行 insp()
),其中混合了一些噪声,例如 printf ()
格式字符串在检查第一个副本后,在第二个副本中出现了一个外来的 0x01。
字符串被完整复制,并且它可以正确处理太长的源字符串(暂时让我们忽略将 0 作为长度传递给 copy_string
可能出现的问题,稍后我将修复该问题)。
但是为什么我的目的地中有外部数组内容(来自格式字符串)?就好像目的地实际上已调整大小以匹配新的长度。
After learning that both strncmp
is not what it seems to be and strlcpy
not being available on my operating system (Linux), I figured I could try and write it myself.
I found a quote from Ulrich Drepper, the libc maintainer, who posted an alternative to strlcpy
using mempcpy
. I don't have mempcpy
either, but it's behaviour was easy to replicate. First of, this is the testcase I have
#include <stdio.h>
#include <string.h>
#define BSIZE 10
void insp(const char* s, int n)
{
int i;
for (i = 0; i < n; i++)
printf("%c ", s[i]);
printf("\n");
for (i = 0; i < n; i++)
printf("%02X ", s[i]);
printf("\n");
return;
}
int copy_string(char *dest, const char *src, int n)
{
int r = strlen(memcpy(dest, src, n-1));
dest[r] = 0;
return r;
}
int main()
{
char b[BSIZE];
memset(b, 0, BSIZE);
printf("Buffer size is %d", BSIZE);
insp(b, BSIZE);
printf("\nFirst copy:\n");
copy_string(b, "First", BSIZE);
insp(b, BSIZE);
printf("b = '%s'\n", b);
printf("\nSecond copy:\n");
copy_string(b, "Second", BSIZE);
insp(b, BSIZE);
printf("b = '%s'\n", b);
return 0;
}
And this is its result:
Buffer size is 10
00 00 00 00 00 00 00 00 00 00
First copy:
F i r s t b =
46 69 72 73 74 00 62 20 3D 00
b = 'First'
Second copy:
S e c o n d
53 65 63 6F 6E 64 00 00 01 00
b = 'Second'
You can see in the internal representation (the lines insp()
created) that there's some noise mixed in, like the printf()
format string in the inspection after the first copy, and a foreign 0x01 in the second copy.
The strings are copied intact and it correctly handles too long source strings (let's ignore the possible issue with passing 0 as length to copy_string
for now, I'll fix that later).
But why are there foreign array contents (from the format string) inside my destination? It's as if the destination was actually RESIZED to match the new length.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
字符串的末尾用 \0 标记,之后的内存可以是任何内容,除非您的操作系统故意将其清空,否则它只是留在那里的任何随机垃圾。
请注意,在这种情况下,“问题”不在 copy_string 中,您正好复制了 10 个字符 - 但主代码中“first”之后的内存只是随机的。
The end of the string is marked by a \0 the memory after that can be anything, unless your OS deliberately blanks it then it's just whatever random junk was left there.
Note in this case the 'problem' isn't in the copy_string , you are exactly copying 10chars - but the memory after 'first' in your main code is just random.
因为您没有停止在源大小处,而是停止在目标大小处,而该目标大小恰好比源大小大,因此您正在复制源字符串以及过去的一些垃圾。
您可以轻松地看到您正在复制源字符串及其空终止符。但由于您正在内存复制 10 个字节,并且字符串“First”和“Second”都短于 10 个字节,因此您还将复制过去的额外字节。
Because you are not stopping at the source size, you are stopping at the destiny size, which happens to be bigger than source, so you are copying the source string plus a bit of garbage past it.
You can easily see that you are copying your source string, with its null terminator. But since you are memcopying 10 bytes and both strings "First" and "Second" are shorter than 10 bytes, you are also copying the extra bytes past them.
如果
dest
和src
都不至少为n,则使用
。memcpy(dest, src, n-1)
会调用未定义的行为长度为-1例如,
First\0
的长度是 6 个字符,但您从中读取了n-1
(9) 个字符;字符串文字末尾之后的内存内容是未定义的,就像您读取该内存时程序的行为一样。The use of
memcpy(dest, src, n-1)
invokes undefined behavior ifdest
andsrc
are not both at leastn-1
in length.For example,
First\0
is six characters in length, but you readn-1
(9) characters from it; the contents of the memory past the end of the string literal are undefined, as is the behavior of your program when you read that memory.额外的“东西”之所以存在,是因为您已将缓冲区大小传递给了memcpy。即使源较短,它也会复制那么多字符。
我会做一些不同的事情:
与
strncpy
不同,strncat
被定义为按照大多数人合理期望的方式工作。The extra "stuff" is there because you've passed the buffer size to
memcpy
. It's going to copy that many characters, even when the source is shorter.I'd do things a bit differently:
Unlike
strncpy
,strncat
is defined to work how most people would reasonably expect.