C strcpy() - 邪恶?
有些人似乎认为 C 的 strcpy()
函数是坏的或邪恶的。 虽然我承认通常最好使用 strncpy()
来避免缓冲区溢出,但以下内容(对于那些不够幸运的人来说是 strdup()
函数的实现)拥有它)安全地使用 strcpy()
并且永远不会溢出:
char *strdup(const char *s1)
{
char *s2 = malloc(strlen(s1)+1);
if(s2 == NULL)
{
return NULL;
}
strcpy(s2, s1);
return s2;
}
*s2
保证有足够的空间来存储 *s1< /code>,并且使用
strcpy()
使我们不必将 strlen()
结果存储在另一个函数中,以便稍后用作不必要的(在本例中)长度参数到strncpy()
。 然而,有些人用 strncpy()
甚至 memcpy()
编写这个函数,它们都需要长度参数。 我想知道人们对此有何看法。 如果您认为 strcpy()
在某些情况下是安全的,请说出来。 如果您有充分的理由在这种情况下不使用 strcpy()
,请给出 - 我想知道为什么使用 strncpy()
可能会更好或在这种情况下使用 memcpy()
。 如果您认为 strcpy()
可以,但不在这里,请解释一下。
基本上,我只是想知道为什么有些人使用 memcpy()
,而另一些人使用 strcpy()
,而还有一些人使用普通的 strncpy()
。 是否有任何逻辑比三个更喜欢一个(忽略前两个的缓冲区检查)?
Some people seem to think that C's strcpy()
function is bad or evil. While I admit that it's usually better to use strncpy()
in order to avoid buffer overflows, the following (an implementation of the strdup()
function for those not lucky enough to have it) safely uses strcpy()
and should never overflow:
char *strdup(const char *s1)
{
char *s2 = malloc(strlen(s1)+1);
if(s2 == NULL)
{
return NULL;
}
strcpy(s2, s1);
return s2;
}
*s2
is guaranteed to have enough space to store *s1
, and using strcpy()
saves us from having to store the strlen()
result in another function to use later as the unnecessary (in this case) length parameter to strncpy()
. Yet some people write this function with strncpy()
, or even memcpy()
, which both require a length parameter. I would like to know what people think about this. If you think strcpy()
is safe in certain situations, say so. If you have a good reason not to use strcpy()
in this situation, please give it - I'd like to know why it might be better to use strncpy()
or memcpy()
in situations like this. If you think strcpy()
is okay, but not here, please explain.
Basically, I just want to know why some people use memcpy()
when others use strcpy()
and still others use plain strncpy()
. Is there any logic to preferring one over the three (disregarding the buffer checks of the first two)?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(17)
我同意。 不过,我建议不要使用
strncpy()
,因为它总是会将您的输出填充到指定的长度。 这是一个历史性的决定,我认为这确实很不幸,因为它严重恶化了性能。考虑这样的代码:
这不会将预期的四个字符写入
buf
,而是写入“foo”,后跟 125 个零字符。 例如,如果您收集大量短字符串,这将意味着您的实际性能远远低于预期。如果可用,我更喜欢使用
snprintf()
,将上面的内容写成:如果复制非常量字符串,则按如下方式完成:
这很重要,因为 if
input
包含 % 字符snprintf()
会解释它们,打开一整架的蠕虫罐头。I agree. I would recommend against
strncpy()
though, since it will always pad your output to the indicated length. This is some historical decision, which I think was really unfortunate as it seriously worsens the performance.Consider code like this:
This will not write the expected four characters to
buf
, but will instead write "foo" followed by 125 zero characters. If you're for instance collecting a lot of short strings, this will mean your actual performance is far worse than expected.If available, I prefer to use
snprintf()
, writing the above like:If instead copying a non-constant string, it's done like this:
This is important, since if
input
contains % characterssnprintf()
would interpret them, opening up whole shelvefuls of cans of worms.我认为 strncpy 也很邪恶。
为了真正保护自己免受此类编程错误的影响,您需要避免编写出 (a) 看起来不错和 (b) 超出缓冲区的代码。
这意味着您需要一个真正的字符串抽象,它不透明地存储缓冲区和容量,将它们永远绑定在一起,并检查边界。 否则,您最终会将字符串及其容量传递到整个商店。 一旦您进行了真正的字符串操作,例如修改字符串的中间部分,将错误的长度传递到 strncpy (尤其是 strncat)几乎与使用太小的目标调用 strcpy 一样容易。
当然,您可能仍然会问是使用 strncpy 还是 strcpy 来实现该抽象:只要您完全理解它的作用,strncpy 就更安全。 但在字符串处理应用程序代码中,依靠 strncpy 来防止缓冲区溢出就像戴了半个避孕套。
所以,你的 strdup-replacement 可能看起来像这样(定义的顺序改变了,让你保持悬念):
这些字符串抽象的问题是,没有人能就其中一个达成一致(例如,上面评论中提到的 strncpy 的特性是否好或好)不好,您是否需要在创建子字符串时共享缓冲区的不可变和/或写时复制字符串等)。 因此,虽然理论上您应该只从货架上拿一个,但最终每个项目都可以拥有一个。
I think strncpy is evil too.
To truly protect yourself from programming errors of this kind, you need to make it impossible to write code that (a) looks OK, and (b) overruns a buffer.
This means you need a real string abstraction, which stores the buffer and capacity opaquely, binds them together, forever, and checks bounds. Otherwise, you end up passing strings and their capacities all over the shop. Once you get to real string ops, like modifying the middle of a string, it's almost as easy to pass the wrong length into strncpy (and especially strncat), as it is to call strcpy with a too-small destination.
Of course you might still ask whether to use strncpy or strcpy in implementing that abstraction: strncpy is safer there provided you fully grok what it does. But in string-handling application code, relying on strncpy to prevent buffer overflows is like wearing half a condom.
So, your strdup-replacement might look something like this (order of definitions changed to keep you in suspense):
The problem with these string abstractions is that nobody can ever agree on one (for instance whether strncpy's idiosyncrasies mentioned in comments above are good or bad, whether you need immutable and/or copy-on-write strings that share buffers when you create a substring, etc). So although in theory you should just take one off the shelf, you can end up with one per project.
如果我已经计算了长度,我倾向于使用
memcpy
,尽管strcpy
通常针对机器字进行了优化,但感觉您应该为库提供 as尽可能多的信息,这样它就可以使用最优化的复制机制。但对于你给出的例子来说,这并不重要 - 如果它会失败,它将在最初的
strlen
中,所以 strncpy 不会给你带来任何安全方面的东西(并且大概< code>strncpy 速度较慢,因为它必须检查边界和 nul),并且memcpy
和strcpy
之间的任何差异都不值得推测性地更改代码。I'd tend to use
memcpy
if I have already calculated the length, althoughstrcpy
is usually optimised to work on machine words, it feels that you should provide the library with as much information as you can, so it can use the most optimal copying mechanism.But for the example you give, it doesn't matter - if it's going to fail, it will be in the initial
strlen
, so strncpy doesn't buy you anything in terms of safety (and presumblystrncpy
is slower as it has to both check bounds and for nul), and any difference betweenmemcpy
andstrcpy
isn't worth changing code for speculatively.当人们这样使用它时,邪恶就来了(尽管下面是超级简化的):
这是一种经常发生的令人惊讶的情况。
但是,是的,在为目标缓冲区分配内存并且已经使用 strlen 来查找长度的任何情况下,strcpy 与 strncpy 一样好。
The evil comes when people use it like this (although the below is super simplified):
Which is a situation that happens suprising often.
But yeah, strcpy is as good as strncpy in any situation where you are allocating memory for the destination buffer and have already used strlen to find the length.
strlen 找到最后一个空终止位置。
但实际上缓冲区并不是以空值终止的。
这就是人们使用不同功能的原因。
strlen finds upto last null terminating place.
But in reality buffers are not null terminated.
that's why people use different functions.
好吧,strcpy() 并不像 strdup() 那么邪恶——至少 strcpy() 是标准 C 的一部分。
Well, strcpy() is not as evil as strdup() - at least strcpy() is part of Standard C.
在您描述的情况下,strcpy 是一个不错的选择。 仅当 s1 不以“\0”结尾时,此 strdup 才会遇到麻烦。
我会添加一条评论,说明为什么 strcpy 没有问题,以防止其他人(以及一年后的您自己)长时间怀疑它的正确性。
strncpy 通常看起来很安全,但可能会给您带来麻烦。 如果源“字符串”短于 count,它将用 '\0' 填充目标,直到达到 count。 这可能对性能不利。 如果源字符串长于 count,strncpy 不会将 '\0' 附加到目标字符串。 当您稍后期望以“\0”结尾的“字符串”时,这肯定会给您带来麻烦。 所以strncpy也要谨慎使用!
如果我不使用以 '\0' 结尾的字符串,我只会使用 memcpy,但这似乎是一个品味问题。
In the situation you describe, strcpy is a good choice. This strdup will only get into trouble if the s1 was not ended with a '\0'.
I would add a comment indicating why there are no problems with strcpy, to prevent others (and yourself one year from now) wondering about its correctness for too long.
strncpy often seems safe, but may get you into trouble. If the source "string" is shorter than count, it pads the target with '\0' until it reaches count. That may be bad for performance. If the source string is longer than count, strncpy does not append a '\0' to the target. That is bound to get you into trouble later on when you expect a '\0' terminated "string". So strncpy should also be used with caution!
I would only use memcpy if I was not working with '\0' terminated strings, but that seems to be a matter of taste.
问题:
可能还有其他问题...看,空终止并不总是一个坏主意。 在某些情况下,为了计算效率或减少存储需求,这是有意义的。
对于编写通用代码(例如业务逻辑)有意义吗? 不。
Problems:
There are probably other problems... Look, null termination isn't always a bad idea. There are situations where, for computational efficiency, or to reduce storage requirements it makes sense.
For writing general purpose code, e.g. business logic does it make sense? No.
此答案使用
size_t
和memcpy()
来实现快速而简单的strdup()
。最好使用
size_t
类型,因为它是从strlen()
返回并由malloc()
和memcpy()< 使用的类型/代码>。
int
不是这些操作的正确类型。memcpy()
很少比strcpy()
或strncpy()
慢,而且通常要快得多。§7.1.1 1 “字符串是由第一个空字符终止并包含第一个空字符的连续字符序列。...”
This answer uses
size_t
andmemcpy()
for a fast and simplestrdup()
.Best to use type
size_t
as that is the type returned fromstrlen()
and used bymalloc()
andmemcpy()
.int
is not the proper type for these operations.memcpy()
is rarely slower thanstrcpy()
orstrncpy()
and often significantly faster.§7.1.1 1 "A string is a contiguous sequence of characters terminated by and including the first null character. ..."
您的代码效率非常低,因为它运行两次字符串来复制它。
一旦进入 strlen()。
然后再次在 strcpy() 中。
并且您不会检查 s1 是否为 NULL。
将长度存储在一些额外的变量中几乎不需要花费任何成本,而运行每个字符串两次来复制它是一个大罪。
Your code is terribly inefficient because it runs through the string twice to copy it.
Once in strlen().
Then again in strcpy().
And you don't check s1 for NULL.
Storing the length in some additional variable costs you about nothing, while running through each and every string twice to copy it is a cardinal sin.
memcpy
可以比strcpy
和strncpy
更快,因为它不必将每个复制的字节与 '\0' 进行比较,并且因为它已经知道复制对象的长度。 它可以通过 Duff 的设备以类似的方式实现,或者使用复制的汇编指令一次几个字节,如 movsw 和 movsdmemcpy
can be faster thanstrcpy
andstrncpy
because it does not have to compare each copied byte with '\0', and because it already knows the length of the copied object. It can be implemented in a similar way with the Duff's device, or use assembler instructions that copy several bytes at a time, like movsw and movsd我遵循此处中的规则。 让我引用一下
因此,如果您按
n
未找到'\0'
,则不会在字符串中获得尾随'\0'
从源字符串到目前为止。 它很容易被误用(当然,如果你知道这个陷阱,你就可以避免它)。 正如引文所说,它并不是被设计为有界的 strcpy。 如果没有必要,我宁愿不使用它。 就您而言,显然没有必要使用它,并且您证明了这一点。 那为什么要用它呢?一般来说,编程代码也是为了减少冗余。 如果您知道有一个包含“n”个字符的字符串,为什么要告诉复制函数复制最多
n
个字符? 你做了多余的检查。 这与性能无关,而更多地与代码的一致性有关。 读者会问自己,strcpy
会做什么,可能会跨越n
个字符,从而有必要限制复制,只是在手册中读到这不会发生。 在这种情况下。 代码的读者之间开始产生困惑。为了合理使用
mem-
、str-
或strn-
,我在上面的链接文档中选择了它们:mem -
当我想复制原始字节时,例如结构的字节。str-
复制空终止字符串时 - 仅当 100% 不会发生溢出时。strn-
当将空终止字符串复制到一定长度时,用零填充剩余字节。 在大多数情况下可能不是我想要的。 人们很容易忘记尾随零填充的事实,但正如上面引用所解释的那样,这是设计使然。 因此,我只需编写自己的小循环来复制字符,添加尾随'\0'
:只需几行即可完全满足我的要求。 如果我想要“原始速度”,我仍然可以寻找一个可移植且优化的实现来完成这个有界 strcpy 工作。 一如既往,先分析,然后再进行处理。
后来,C 获得了处理宽字符的函数,称为
wcs-
和wcsn-
(针对C99
)。 我也会同样使用它们。I'm following the rules in here. Let me quote from it
For that reason, you will not get a trailing
'\0'
in a string if you hit then
not finding a'\0'
from the source string so far. It's easy to misuse it (of course, if you know about that pitfall, you can avoid it). As the quote says, it wasn't designed as a bounded strcpy. And i would prefer not to use it if not necessary. In your case, clearly its use is not necessary and you proved it. Why then use it?And generally speaking, programming code is also about reducing redundancy. If you know you have a string containing 'n' characters, why tell the copying function to copy maximal
n
characters? You do redundant checking. It's little about performance, but much more about consistent code. Readers will ask themselves whatstrcpy
could do that could cross then
characters and which makes it necessary to limit the copying, just to read in manuals that this cannot happen in that case. And there the confusion start happen among readers of the code.For the rational to use
mem-
,str-
orstrn-
, i chose among them like in the above linked document:mem-
when i want to copy raw bytes, like bytes of a structure.str-
when copying a null terminated string - only when 100% no overflow could happen.strn-
when copying a null terminated string up to some length, filling the remaining bytes with zero. Probably not what i want in most cases. It's easy to forget the fact with the trailing zero-fill, but it's by design as the above quote explains. So, i would just code my own small loop that copies characters, adding a trailing'\0'
:Just a few lines that do exactly what i want. If i wanted "raw speed" i can still look out for a portable and optimized implementation that does exactly this bounded strcpy job. As always, profile first and then mess with it.
Later, C got functions for working with wide characters, called
wcs-
andwcsn-
(forC99
). I would use them likewise.人们使用 strncpy 而不是 strcpy 的原因是因为字符串并不总是以 null 结尾,并且很容易溢出缓冲区(使用 strcpy 为字符串分配的空间)并覆盖一些不相关的内存位。
对于 strcpy,这种情况可能发生,对于 strncpy,这种情况永远不会发生。 这就是为什么 strcpy 被认为是不安全的。 邪恶可能有点强。
The reason why people use strncpy not strcpy is because strings are not always null terminated and it's very easy to overflow the buffer (the space you have allocated for the string with strcpy) and overwrite some unrelated bit of memory.
With strcpy this can happen, with strncpy this will never happen. That is why strcpy is considered unsafe. Evil might be a little strong.
坦率地说,如果您在 C 中进行大量字符串处理,您不应该问自己是否应该使用
strcpy
或strncpy
或memcpy
。 您应该找到或编写一个提供更高级别抽象的字符串库。 例如,它可以跟踪每个字符串的长度,为您分配内存,并提供您需要的所有字符串操作。这几乎肯定可以保证您很少犯通常与 C 字符串处理相关的错误,例如缓冲区溢出、忘记以 NUL 字节终止字符串等等。
该库可能具有如下功能:
我为 Kannel 项目编写了一个函数,请参阅 gwlib/octstr.h 文件。 它使我们的生活变得更加简单。 另一方面,这样的库编写起来相当简单,因此您可以自己编写一个库,即使只是作为练习。
Frankly, if you are doing much string handling in C, you should not ask yourself whether you should use
strcpy
orstrncpy
ormemcpy
. You should find or write a string library that provides a higher level abstraction. For example, one that keeps track of the length of each string, allocates memory for you, and provides all the string operations you need.This will almost certainly guarantee you make very few of the kinds of mistakes usually associated with C string handling, such as buffer overflows, forgetting to terminate a string with a NUL byte, and so on.
The library might have functions such as these:
I wrote one for the Kannel project, see the gwlib/octstr.h file. It made life much simpler for us. On the other hand, such a library is fairly simple to write, so you might write one for yourself, even if only as an exercise.
没有人提到
strlcpy
,由 Todd C. Miller 和 Theo de Raadt 开发。 正如他们在论文中所说:对于使用
strlcpy
有一些反对意见; 维基百科页面指出但是,我相信这除了手动调整
strncpy
的参数之外,只是迫使那些知道自己在做什么的人添加手动 NULL 终止。 使用 strlcpy 可以更轻松地避免缓冲区溢出,因为您未能以 NULL 终止缓冲区。另请注意,glibc 或 Microsoft 的库中缺少
strlcpy
不应成为使用障碍; 您可以在任何 BSD 发行版中找到strlcpy
及其朋友的源代码,并且该许可证可能对您的商业/非商业项目友好。 请参阅strlcpy.c
顶部的注释。No one has mentioned
strlcpy
, developed by Todd C. Miller and Theo de Raadt. As they say in their paper:There are counter-arguments for the use of
strlcpy
; the Wikipedia page makes note thatHowever, I believe that this just forces people that know what they're doing to add a manual NULL termination, in addition to a manual adjustment to the argument to
strncpy
. Use ofstrlcpy
makes it much easier to avoid buffer overruns because you failed to NULL terminate your buffer.Also note that the lack of
strlcpy
in glibc or Microsoft's libraries should not be a barrier to use; you can find the source forstrlcpy
and friends in any BSD distribution, and the license is likely friendly to your commercial/non-commercial project. See the comment at the top ofstrlcpy.c
.我个人的想法是,如果代码能够被证明是有效的并且如此迅速地完成,那么它是完全可以接受的。 也就是说,如果代码很简单并且显然是正确的,那么就可以了。
但是,您的假设似乎是,当您的函数正在执行时,没有其他线程会修改
s1
指向的字符串。 如果此函数在成功内存分配(以及对strlen
的调用)后被中断,字符串增长,并且 bam 自以来出现缓冲区溢出情况,会发生什么情况strcpy
复制到 NULL 字节。下面的情况可能会更好:
现在,绳子可以因您自己的错误而增长,并且您很安全。 结果不会是重复,但也不会是任何疯狂的溢出。
您提供的代码实际上成为错误的可能性非常低(如果您在不支持线程的环境中工作,则几乎不存在,甚至不存在) 。 这只是需要思考的事情。
预计到达时间:这是一个稍微好一点的实现:
返回字符数。 您还可以:
这将以
NUL
字节终止它。 无论哪种方式都比我最初快速组合的方式要好。I personally am of the mindset that if the code can be proven to be valid—and done so quickly—it is perfectly acceptable. That is, if the code is simple and thus obviously correct, then it is fine.
However, your assumption seems to be that while your function is executing, no other thread will modify the string pointed to by
s1
. What happens if this function is interrupted after successful memory allocation (and thus the call tostrlen
), the string grows, and bam you have a buffer overflow condition sincestrcpy
copies to the NULL byte.The following might be better:
Now, the string can grow through no fault of your own and you're safe. The result will not be a dup, but it won't be any crazy overflows, either.
The probability of the code you provided actually being a bug is pretty low (pretty close to non-existent, if not non-existent, if you are working in an environment that has no support for threading whatsoever). It's just something to think about.
ETA: Here is a slightly better implementation:
There the number of characters is being returned. You can also:
Which will terminate it with a
NUL
byte. Either way is better than the one that I quickly put together originally.