标准 C 库中的许多函数,尤其是用于字符串操作的函数,尤其是 strcpy(),都共享以下原型:
char *the_function (char *destination, ...)
这些函数的返回值实际上与提供的目标相同。代码>.为什么要把返回值浪费在多余的东西上呢?对于这样的函数来说,无效或返回有用的东西更有意义。
我对为什么会这样的唯一猜测是,将函数调用嵌套在另一个表达式中更容易、更方便,例如:
printf("%s\n", strcpy(dst, src));
还有其他合理的理由来证明这个习惯用法吗?
A lot of the functions from the standard C library, especially the ones for string manipulation, and most notably strcpy(), share the following prototype:
char *the_function (char *destination, ...)
The return value of these functions is in fact the same as the provided destination
. Why would you waste the return value for something redundant? It makes more sense for such a function to be void or return something useful.
My only guess as to why this is is that it's easier and more convenient to nest the function call in another expression, for example:
printf("%s\n", strcpy(dst, src));
Are there any other sensible reasons to justify this idiom?
发布评论
评论(6)
正如埃文指出的,可以做类似的
事情为 malloc()ed 内存分配一个值,而不使用辅助变量。
(这个例子不是最好的,它会在内存不足的情况下崩溃,但这个想法很明显)
as Evan pointed out, it is possible to do something like
e.g. assign
malloc()ed
memory a value, without using helper variable.(this example isn't the best one, it will crash on out of memory conditions, but the idea is obvious)
char *stpcpy(char *dest, const char *src);
返回一个指向字符串结尾的指针,并且是 POSIX.1-2008 的一部分。在此之前,它自 1992 年以来一直是 GNU libc 扩展。它于 1986 年首次出现在 Lattice C AmigaDOS 中。gcc -O3
在某些情况下会优化strcpy
+strcat
使用stpcpy
或strlen
+ 内联复制,请参见下文。C 的标准库设计得很早,很容易认为
str*
函数没有经过优化设计。 I/O 函数肯定是非常很早就设计的,在 1972 年,C 甚至还没有预处理器,即 为什么fopen(3)
采用模式字符串而不是像 Unix 那样的标志位图打开(2)
。我无法找到 Mike Lesk 的“便携式 I/O 包”中包含的函数列表,因此我不知道当前形式的
strcpy
是否可以追溯到那里或者这些功能是后来添加的。 (我找到的唯一真正来源是 Dennis Ritchie 广为人知的 C历史文章,非常好,但不够深入。我没有找到实际 I/O 包本身的任何文档或源代码。)它们确实以当前的形式出现。在 K&R 第一版,1978 年。
函数应该返回它们所做的计算结果(如果它对调用者可能有用),而不是丢弃它。作为指向字符串末尾的指针,或整数长度。 (指针是很自然的。)
正如@R 所说:
例如调用
strcat(bigstr, newstr[i])
在循环中从许多短(O(1) 长度)字符串构建长字符串的复杂度大约为O(n^2)
,但是strlen
/memcpy
只会查看每个字符两次(一次在 strlen 中,一次在 memcpy 中)。仅使用 ANSI C 标准库,无法高效地仅查看每个字符一次。您可以手动编写一次一个字节的循环,但对于长度超过几个字节的字符串,这比使用现代硬件上的当前编译器(不会自动矢量化搜索循环)两次查看每个字符更糟糕,给出高效的 libc 提供的 SIMD strlen 和 memcpy。您可以使用 length = sprintf(bigstr, "%s", newstr[i]); bigstr+=length;,但是
sprintf()
必须解析其格式字符串,并且速度不快。甚至没有一个
strcmp
或memcmp
版本可以返回差异的位置。如果这就是您想要的,您将遇到与 相同的问题为什么 python 中的字符串比较如此快?:一个优化的库函数,其运行速度比使用已编译循环执行的任何操作都要快(除非您为您关心的每个目标平台都有手动优化的 asm),您可以用于接近不同的字节,然后在接近时回退到常规循环。看来 C 的字符串库在设计时没有考虑任何操作的 O(n) 成本,而不仅仅是查找隐式长度字符串的末尾,而
strcpy
的行为绝对不是唯一的例子。它们基本上将隐式长度字符串视为整个不透明对象,总是返回指向开头的指针,而不是返回到结尾或在搜索或附加后返回到字符串内部的位置。
历史猜测
在 PDP-11 的早期 C 中,我怀疑
strcpy
并不比while(*dst++ = *src++) {}
(并且可能是这样实现的)。事实上, K&R 第一版(第 101 页) 显示了
strcpy
的实现并表示:这意味着他们完全希望程序员在需要
dst
或src
最终值的情况下编写自己的循环。因此,也许他们没有意识到需要重新设计标准库 API,直到为手工优化的 asm 库函数公开更多有用的 API 时为时已晚。但是返回 dst 的原始值有什么意义吗?
strcpy(dst, src)
返回dst
类似于x=y
计算x
< /强>。所以它使 strcpy 像字符串赋值运算符一样工作。正如其他答案指出的那样,这允许嵌套,例如 foo( strcpy(buf,input) ); 。早期的计算机内存非常有限。 保持源代码紧凑是常见的做法。打孔卡和缓慢的终端可能是其中的一个因素。我不知道历史编码标准或风格指南,也不知道什么被认为太多而无法放在一行中。
陈旧的编译器也可能是一个因素。使用现代优化编译器,
char *tmp = foo();
/bar(tmp);
并不慢于bar(foo());
,但它是与gcc -O0
一起使用的。我不知道早期的编译器是否可以完全优化变量(不为它们保留堆栈空间),但希望它们至少可以在简单的情况下将它们保留在寄存器中(不像现代的 gcc -O0 那样)故意溢出/重新加载所有内容以进行一致的调试)。即,对于古代编译器来说,gcc -O0 并不是一个好的模型,因为它是为了一致调试而故意进行反优化的。编译器生成 asm 的可能动机
鉴于 C 字符串库的通用 API 设计中缺乏对效率的关注,这可能不太可能。但也许有代码大小的好处。 (在早期的计算机上,代码大小比 CPU 时间更具有硬性限制)。
我对早期 C 编译器的质量了解不多,但可以肯定的是,它们在优化方面并不出色,即使对于像 PDP-11 这样的简单/正交架构也是如此。
通常需要在函数调用之后使用字符串指针。在汇编级别,您(编译器)可能在调用之前将其保存在寄存器中。根据调用约定,您可以将其压入堆栈,也可以将其复制到调用约定指定第一个参数所在的右侧寄存器。 (即
strcpy
所期望的位置)。或者,如果您提前计划,则指针已经位于调用约定的正确寄存器中。但是函数调用会破坏一些寄存器,包括所有参数传递寄存器。 (因此,当函数在寄存器中获取 arg 时,它可以在那里递增它,而不是复制到暂存寄存器。)
因此,作为调用者,用于在函数调用中保留某些内容的代码生成选项包括:
dst = strcpy(dst, src);
如果您没有嵌套它)。我知道所有体系结构上的所有调用约定都会在寄存器中返回指针大小的返回值,因此在库函数中可能有一条额外的指令可以在所有想要使用该返回值的调用者中节省代码大小。
通过使用 strcpy 的返回值(已经在寄存器中),您可能会从原始的早期 C 编译器获得更好的汇编,而不是让编译器将调用周围的指针保存在调用保留的寄存器中或溢出它到堆栈。情况可能仍然如此。
顺便说一句,在许多 ISA 上,返回值寄存器不是第一个参数传递寄存器。除非您使用基址+索引寻址模式,否则 strcpy 确实会花费额外的指令(并占用另一个寄存器)来复制指针增量循环的寄存器。
PDP-11 工具链 通常使用某种堆栈-args 调用约定,始终将 args 压入堆栈。我不确定有多少调用保留寄存器与调用破坏寄存器是正常的,但只有 5 或 6 个 GP 寄存器可用 (R7 是程序计数器,R6 是堆栈指针,R5 通常用作帧指针)。所以它类似于 32 位 x86,但比 32 位 x86 更局促。
这比不使用 dst = 而是重用 strcat 的输入参数的版本要紧凑得多。 (参见 在 Godbolt 编译器上explorer。)
-O3
输出非常不同:gcc 对于不使用返回值的版本使用stpcpy
(返回指向尾部的指针),然后使用mov
-immediate 将文字字符串数据直接存储到正确的位置。但不幸的是,
dst = strcpy(dst, src)
-O3版本仍然使用常规strcpy
,然后将strcat
内联为strlen< /code> +
mov
- 立即。是否为 C 字符串
C 隐式长度字符串并不总是本质上不好,并且具有有趣的优点(例如,后缀也是有效的字符串,而无需复制它)。
但是 C 字符串库的设计方式并未使高效代码成为可能,因为一次循环通常不会自动向量化,并且库函数会丢弃它们的工作结果。必须做的。
gcc 和 clang 永远不会自动向量化循环,除非在第一次迭代之前知道迭代计数,例如 for(int i=0; i
strncpy
等等基本上都是一场灾难。例如,如果strncpy
达到缓冲区大小限制,则不会复制终止'\0'
,因此您需要手动arr[n] = 0;< /code> 之前或之后。但如果源字符串较短,它将用
0
字节填充到指定的长度,可能会触及不需要触及的内存页。 (这也使得将短字符串复制到仍然有大量剩余空间的大缓冲区中效率非常低。)它似乎是为写入较大字符串的中间而设计的,而不是为了避免缓冲区溢出。
像
snprintf
这样的一些函数是可用的,并且总是以 nul 终止。记住哪个做哪个是很困难的,如果你记错了,风险会很大,所以你必须每次都检查是否对正确性很重要。正如 Bruce Dawson 所说:已经停止使用 strncpy!。显然,一些 MSVC 扩展(例如
_snprintf
)更糟糕。strncat
也存在于 POSIX 中 .2001并且与strcpy
无关;它会做你所希望的事情,一个总是以 0 结尾的边界检查strcpy
。但与 strcat 一样,它仍然返回原始指针,因此对于有效地将字符串附加到缓冲区中没有用处;如果您只是在同一缓冲区上重复调用它,则每次都必须重新扫描前导部分才能找到当前结尾。手册页提到“画家 Shlemiel”。char *stpcpy(char *dest, const char *src);
returns a pointer to the end of the string, and is part of POSIX.1-2008. Before that, it was a GNU libc extension since 1992. It first appeared in Lattice C AmigaDOS in 1986.gcc -O3
will in some cases optimizestrcpy
+strcat
to usestpcpy
orstrlen
+ inline copying, see below.C's standard library was designed very early, and it's very easy to argue that the
str*
functions are not optimally designed. The I/O functions were definitely designed very early, in 1972 before C even had a preprocessor, which is whyfopen(3)
takes a mode string instead of a flag bitmap like Unixopen(2)
.I haven't been able to find a list of functions included in Mike Lesk's "portable I/O package", so I don't know whether
strcpy
in its current form dates all the way back to there or if those functions were added later. (The only real source I've found is Dennis Ritchie's widely-known C History article, which is excellent but not that in depth. I didn't find any documentation or source code for the actual I/O package itself.)They do appear in their current form in K&R first edition, 1978.
Functions should return the result of computation they do, if it's potentially useful to the caller, instead of throwing it away. Either as a pointer to the end of the string, or an integer length. (A pointer would be natural.)
As @R says:
e.g. calling
strcat(bigstr, newstr[i])
in a loop to build up a long string from many short (O(1) length) strings has approximatelyO(n^2)
complexity, butstrlen
/memcpy
will only look at each character twice (once in strlen, once in memcpy).Using only the ANSI C standard library, there's no way to efficiently only look at every character once. You could manually write a byte-at-a-time loop, but for strings longer than a few bytes, that's worse than looking at each character twice with current compilers (which won't auto-vectorize a search loop) on modern HW, given efficient libc-provided SIMD strlen and memcpy. You could use
length = sprintf(bigstr, "%s", newstr[i]); bigstr+=length;
, butsprintf()
has to parse its format string and is not fast.There isn't even a version of
strcmp
ormemcmp
that returns the position of the difference. If that's what you want, you have the same problem as Why is string comparison so fast in python?: an optimized library function that runs faster than anything you can do with a compiled loop (unless you have hand-optimized asm for every target platform you care about), which you can use to get close to the differing byte before falling back to a regular loop once you get close.It seems that C's string library was designed without regard to the O(n) cost of any operation, not just finding the end of implicit-length strings, and
strcpy
's behaviour is definitely not the only example.They basically treat implicit-length strings as whole opaque objects, always returning pointers to the start, never to the end or to a position inside one after searching or appending.
History guesswork
In early C on a PDP-11, I suspect that
strcpy
was no more efficient thanwhile(*dst++ = *src++) {}
(and was probably implemented that way).In fact, K&R first edition (page 101) shows that implementation of
strcpy
and says:This implies they fully expected programmers to write their own loops in cases where you wanted the final value of
dst
orsrc
. And thus maybe they didn't see a need to redesign the standard library API until it was too late to expose more useful APIs for hand-optimized asm library functions.But does returning the original value of
dst
make any sense?strcpy(dst, src)
returningdst
is analogous tox=y
evaluating to thex
. So it makes strcpy work like a string assignment operator.As other answers point out, this allows nesting, like
foo( strcpy(buf,input) );
. Early computers were very memory-constrained. Keeping your source code compact was common practice. Punch cards and slow terminals were probably a factor in this. I don't know historical coding standards or style guides or what was considered too much to put on one line.Crusty old compilers were also maybe a factor. With modern optimizing compilers,
char *tmp = foo();
/bar(tmp);
is no slower thanbar(foo());
, but it is withgcc -O0
. I don't know if very early compilers could optimize variables away completely (not reserving stack space for them), but hopefully they could at least keep them in registers in simple cases (unlike moderngcc -O0
which on purpose spills/reloads everything for consistent debugging). i.e.gcc -O0
isn't a good model for ancient compilers, because it's anti-optimizing on purpose for consistent debugging.Possible compiler-generated-asm motivation
Given the lack of care about efficiency in the general API design of the C string library, this might be unlikely. But perhaps there was a code-size benefit. (On early computers, code-size was more of a hard limit than CPU time).
I don't know much about the quality of early C compilers, but it's a safe bet that they were not awesome at optimizing, even for a nice simple / orthogonal architecture like PDP-11.
It's common to want the string pointer after the function call. At an asm level, you (the compiler) probably has it in a register before the call. Depending on calling convention, you either push it on the stack or you copy it to the right register where the calling convention says the first arg goes. (i.e. where
strcpy
is expecting it). Or if you're planning ahead, you already had the pointer in the right register for the calling convention.But function calls clobber some registers, including all the arg-passing registers. (So when a function gets an arg in a register, it can increment it there instead of copying to a scratch register.)
So as the caller, your code-gen option for keeping something across a function call include:
dst = strcpy(dst, src);
if you aren't nesting it).All calling conventions on all architectures I'm aware of return pointer-sized return values in a register, so having maybe one extra instruction in the library function can save code-size in all callers that want to use that return value.
You probably got better asm from primitive early C compilers by using the return value of
strcpy
(already in a register) than by making the compiler save the pointer around the call in a call-preserved register or spill it to the stack. This may still be the case.BTW, on many ISAs, the return-value register is not the first arg-passing register. And unless you use base+index addressing modes, it does cost an extra instruction (and tie up another reg) for strcpy to copy the register for a pointer-increment loop.
PDP-11 toolchains normally used some kind of stack-args calling convention, always pushing args on the stack. I'm not sure how many call-preserved vs. call-clobbered registers were normal, but only 5 or 6 GP regs were available (R7 being the program counter, R6 being the stack pointer, R5 often used as a frame pointer). So it's similar to but even more cramped than 32-bit x86.
This is significantly more compact than a version which doesn't use
dst =
, and instead reuses the input arg for thestrcat
. (See both on the Godbolt compiler explorer.)The
-O3
output is very different: gcc for the version that doesn't use the return value usesstpcpy
(returns a pointer to the tail) and thenmov
-immediate to store the literal string data directly to the right place.But unfortunately, the
dst = strcpy(dst, src)
-O3 version still uses regularstrcpy
, then inlinesstrcat
asstrlen
+mov
-immediate.To C-string or not to C-string
C implicit-length strings aren't always inherently bad, and have interesting advantages (e.g. a suffix is also a valid string, without having to copy it).
But the C string library is not designed in a way that makes efficient code possible, because
char
-at-a-time loops typically don't auto-vectorize and the library functions throw away results of work they have to do.gcc and clang never auto-vectorize loops unless the iteration count is known before the first iteration, e.g.
for(int i=0; i<n ;i++)
. ICC can vectorize search loops, but it's still unlikely to do as well as hand-written asm.strncpy
and so on are basically a disaster. e.g.strncpy
doesn't copy the terminating'\0'
if it reaches the buffer size limit, so you need to manuallyarr[n] = 0;
before or after. But if the source string is shorter, it pads with0
bytes out to the specified length, potentially touching a page of memory that never needed to be touched. (Also making it very inefficient for copying short strings into a large buffer that still has lots of space left.)It appears to have been designed for writing into the middle of larger strings, not for avoiding buffer overflows.
A few functions like
snprintf
are usable and do always nul-terminate. Remembering which does which is hard, and a huge risk if you remember wrong, so you have to check every time in cases where it matters for correctness.As Bruce Dawson says: Stop using strncpy already!. Apparently some MSVC extensions like
_snprintf
are even worse.strncat
also exists in POSIX.2001 and is unrelated tostrcpy
; it does what you'd hope, a bounds-checkedstrcpy
which always 0-terminates. But likestrcat
it still returns the original pointer so is not useful for efficiently appending strings into a buffer; it has to re-scan the leading part every time to find the current end if you simply call it repeatedly on the same buffer. The man page mentions "Shlemiel the painter".我相信你的猜测是正确的,它使嵌套调用变得更容易。
I believe that your guess is correct, it makes it easier to nest the call.
它也非常容易编码。
返回值通常保留在 AX 寄存器中(这不是强制性的,但经常是这种情况)。当函数启动时,目标被放入 AX 寄存器中。
要返回目的地,程序员需要做......什么都不做!只需将值保留在原来的位置即可。
程序员可以将该函数声明为
void
。但返回值已经在正确的位置,只是等待返回,甚至不需要额外的指令来返回它!无论改进多么小,在某些情况下都是很方便的。Its also extremely easy to code.
The return value is typically left in the AX register (it is not mandatory, but it is frequently the case). And the destination is put in the AX register when the function starts.
To return the destination, the programmer needs to do.... exactly nothing! Just leave the value where it is.
The programmer could declare the function as
void
. But that return value is already in the right spot, just waiting to be returned, and it doesn't even cost an extra instruction to return it! No matter how small the improvement, it is handy in some cases.与流畅界面相同的概念。只是让代码更快/更容易阅读。
Same concept as Fluent Interfaces. Just making code quicker/easier to read.
我认为这样设置并不是为了嵌套目的,而是为了错误检查。如果内存服务没有一个 c 标准库函数自己做太多错误检查,因此更有意义的是确定 strcpy 调用期间是否出现问题。
I don't think this is really set up this way for nesting purposes, but more for error checking. If memory serves none of the c standard library functions do much error checking on their own and therefor it makes more sense that this would be to determine if something went awry during the strcpy call.