在 C 中按值传递结构而不是传递指针有什么缺点吗?
在 C 中按值传递结构而不是传递指针有什么缺点吗?
如果结构体很大,显然会存在复制大量数据的性能问题,但对于较小的结构体,它基本上应该与将多个值传递给函数相同。
当用作返回值时,它可能会更有趣。 C 函数只有单个返回值,但您通常需要多个返回值。 因此,一个简单的解决方案是将它们放入一个结构中并返回该结构。
有什么理由支持或反对吗?
由于大家可能不太清楚我在这里所说的内容,所以我举一个简单的例子。
如果您使用 C 进行编程,您迟早会开始编写如下所示的函数:
void examine_data(const char *ptr, size_t len)
{
...
}
char *p = ...;
size_t l = ...;
examine_data(p, l);
这不是问题。 唯一的问题是,您必须与同事同意参数的顺序,以便您在所有函数中使用相同的约定。
但是当您想要返回相同类型的信息时会发生什么? 您通常会得到这样的信息:
char *get_data(size_t *len);
{
...
*len = ...datalen...;
return ...data...;
}
size_t len;
char *p = get_data(&len);
这工作正常,但问题较多。 返回值就是返回值,但在此实现中它不是。 从上面无法看出函数 get_data
不允许查看 len
指向的内容。 并且没有任何东西可以让编译器检查是否确实通过该指针返回了值。 因此,下个月,当其他人在没有正确理解代码的情况下修改代码时(因为他没有阅读文档?),它会在没有人注意到的情况下被破坏,或者开始随机崩溃。
因此,我提出的解决方案是简单的 struct
struct blob { char *ptr; size_t len; }
这些示例可以这样重写:
void examine_data(const struct blob data)
{
... use data.tr and data.len ...
}
struct blob = { .ptr = ..., .len = ... };
examine_data(blob);
struct blob get_data(void);
{
...
return (struct blob){ .ptr = ...data..., .len = ...len... };
}
struct blob data = get_data();
出于某种原因,我认为大多数人会本能地让 examine_data
获取指向 struct blob 的指针,但我不这样做不明白为什么。 它仍然得到一个指针和一个整数,只是它们在一起更加清晰。 在 get_data
情况下,不可能像我之前描述的那样搞乱,因为没有长度的输入值,并且必须有一个返回的长度。
Are there any downsides to passing structs by value in C, rather than passing a pointer?
If the struct is large, there is obviously the performance aspect of copying lots of data, but for a smaller struct, it should basically be the same as passing several values to a function.
It is maybe even more interesting when used as return values. C only has single return values from functions, but you often need several. So a simple solution is to put them in a struct and return that.
Are there any reasons for or against this?
Since it might not be obvious to everyone what I'm talking about here, I'll give a simple example.
If you're programming in C, you'll sooner or later start writing functions that look like this:
void examine_data(const char *ptr, size_t len)
{
...
}
char *p = ...;
size_t l = ...;
examine_data(p, l);
This isn't a problem. The only issue is that you have to agree with your coworker in which the order the parameters should be so you use the same convention in all functions.
But what happens when you want to return the same kind of information? You typically get something like this:
char *get_data(size_t *len);
{
...
*len = ...datalen...;
return ...data...;
}
size_t len;
char *p = get_data(&len);
This works fine, but is much more problematic. A return value is a return value, except that in this implementation it isn't. There is no way to tell from the above that the function get_data
isn't allowed to look at what len
points to. And there is nothing that makes the compiler check that a value is actually returned through that pointer. So next month, when someone else modifies the code without understanding it properly (because he didn't read the documentation?) it gets broken without anyone noticing, or it starts crashing randomly.
So, the solution I propose is the simple struct
struct blob { char *ptr; size_t len; }
The examples can be rewritten like this:
void examine_data(const struct blob data)
{
... use data.tr and data.len ...
}
struct blob = { .ptr = ..., .len = ... };
examine_data(blob);
struct blob get_data(void);
{
...
return (struct blob){ .ptr = ...data..., .len = ...len... };
}
struct blob data = get_data();
For some reason, I think that most people would instinctively make examine_data
take a pointer to a struct blob, but I don't see why. It still gets a pointer and an integer, it's just much clearer that they go together. And in the get_data
case it is impossible to mess up in the way I described before, since there is no input value for the length, and there must be a returned length.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(11)
简单的解决方案将返回一个错误代码作为返回值,并将其他所有内容作为函数中的参数,
当然,这个参数可以是一个结构体,但通过值传递它并没有看到任何特殊的优势,只是发送了一个指针。
按值传递结构是危险的,你需要非常小心你传递的是什么,记住C中没有复制构造函数,如果结构参数之一是指针,则指针值将被复制,这可能会非常混乱并且难以理解维持。
只是为了完成答案(完全归功于 Roddy )堆栈使用情况不按值传递结构的另一个原因是,相信我调试堆栈溢出是真正的 PITA。
重播评论:
通过指针传递结构意味着某个实体拥有该对象的所有权,并且完全了解应该释放什么以及何时释放。 按值传递结构会创建对结构内部数据的隐藏引用(指向另一个结构的指针等),这很难维护(可能但为什么?)。
Simple solution will be return an error code as a return value and everything else as a parameter in the function,
This parameter can be a struct of course but don't see any particular advantage passing this by value, just sent a pointer.
Passing structure by value is dangerous, you need to be very careful what are you passing are, remember there is no copy constructor in C, if one of structure parameters is a pointer the pointer value will be copied it might be very confusing and hard to maintain.
Just to complete the answer (full credit to Roddy ) the stack usage is another reason not pass structure by value, believe me debugging stack overflow is real PITA.
Replay to comment:
Passing struct by pointer meaning that some entity has an ownership on this object and have a full knowledge of what and when should be released. Passing struct by value create a hidden references to the internal data of struct (pointers to another structures etc .. ) at this is hard to maintain (possible but why ?) .
这里有一点没有人提到:
const struct
的成员是const
,但如果该成员是一个指针(如char *
),它就变成char *const
而不是我们真正想要的const char *
。 当然,我们可以假设 const 是意图的文档,任何违反这一点的人都在编写糟糕的代码(他们确实如此),但这对于某些人来说还不够好(尤其是那些刚刚花了四小时追踪事故原因)。另一种方法可能是创建一个 struct const_blob { const char *c; size_t l } 并使用它,但这相当混乱 - 它遇到了与 typedef 指针相同的命名方案问题。 因此,大多数人坚持只使用两个参数(或者,在这种情况下更可能使用字符串库)。
Here's something no one mentioned:
Members of a
const struct
areconst
, but if that member is a pointer (likechar *
), it becomeschar *const
rather than theconst char *
we really want. Of course, we could assume that theconst
is documentation of intent, and that anyone who violates this is writing bad code (which they are), but that's not good enough for some (especially those who just spent four hours tracking down the cause of a crash).The alternative might be to make a
struct const_blob { const char *c; size_t l }
and use that, but that's rather messy - it gets into the same naming-scheme problem I have withtypedef
ing pointers. Thus, most people stick to just having two parameters (or, more likely for this case, using a string library).我认为你的问题已经很好地概括了事情。
按值传递结构的另一个优点是内存所有权是显式的。 毫无疑问,该结构是否来自堆,以及谁有责任释放它。
I think that your question has summed things up pretty well.
One other advantage of passing structs by value is that memory ownership is explicit. There is no wondering about if the struct is from the heap, and who has the responsibility for freeing it.
我想说按值传递(不太大的)结构,无论是作为参数还是作为返回值,都是一种完全合法的技术。 当然,必须注意该结构要么是 POD 类型,要么是明确指定的复制语义。
更新:抱歉,我已经具备了 C++ 思维能力。 我记得有一段时间,在 C 中从函数返回结构体是不合法的,但从那时起这可能已经改变了。 我仍然会说,只要您期望使用的所有编译器都支持这种做法,它就是有效的。
I'd say passing (not-too-large) structs by value, both as parameters and as return values, is a perfectly legitimate technique. One has to take care, of course, that the struct is either a POD type, or the copy semantics are well-specified.
Update: Sorry, I had my C++ thinking cap on. I recall a time when it was not legal in C to return a struct from a function, but this has probably changed since then. I would still say it's valid as long as all the compilers you expect to use support the practice.
http://www.drpaulcarter.com/pcasm/ 上的 PC 组装教程第 150 页有明确的说明关于 C 如何允许函数返回结构体的解释:
我用下面的C代码来验证上面的说法:
使用“gcc -S”为这段C代码生成程序集:
调用create之前的堆栈:
调用create之后的堆栈:
Page 150 of PC Assembly Tutorial on http://www.drpaulcarter.com/pcasm/ has a clear explanation about how C allows a function to return a struct:
I use the following C code to verify the above statement:
Use "gcc -S" to generate assembly for this piece of C code:
The stack before call create:
The stack right after calling create:
我只想指出按值传递结构的一个优点是优化编译器可以更好地优化您的代码。
I just want to point one advantage of passing your structs by value is that an optimizing compiler may better optimize your code.
考虑到人们所说的所有事情......
A。 返回寄存器中的每个成员(可能是最佳的,但不太可能是实际的......)
b. 返回堆栈中的结构(比寄存器慢,但仍然比堆内存的冷访问更好......耶缓存!)
C。 将结构返回到堆的指针中(它只会在您读取或写入时伤害您?一个好的编译器将传递它只读取一次并尝试访问的指针,进行指令重新排序并比需要更早地访问它,因此它你准备好了吗?(颤抖))
阅读本文后我将采取一些简单的措施...
我不确定“太大”和“太小”在哪里,但我猜答案是在 2 到注册数 + 1 名成员之间。
如果我创建了一个包含 1 个 int 成员的结构,那么显然我们不应该传递该结构。 (它不仅效率低下,而且还使意图非常模糊......我想它在某个地方有用途,但不常见)
如果我制作一个包含两个项目的结构,它可能具有清晰的价值,以及编译器可能将其优化为两个成对传播的变量。 (risc-v 指定具有两个成员的结构返回寄存器中的两个成员,假设它们是整数或更小...)
如果我创建一个结构,该结构保存与处理器中的寄存器中一样多的整数和双精度数,从技术上讲,这是一种可能的优化。
不过,在我超出寄存器数量的实例中,将结果结构保留在指针中并仅传递相关参数可能是值得的。 (这可能会使结构更小,函数的功能更少,因为我们现在的系统上有很多寄存器,即使在嵌入式世界中也是如此......)
Taking into account all of the things people have said...
a. Returning each member in a register (probably optimal, but unlikely to be the actual...)
b. Returning the struct in the stack (slower than registers, but still better than a cold access of heap ram... yay caching!)
c. Returning the struct in a pointer to the heap (It only hurts you when you read or write to it? A Good compiler will pass the pointers it read just once and tried to access, did instruction reordering and accesses it much earlier than needed so it was ready when you were? to make life better? (shiver))
Some simple measures I will take after reading this...
I am not sure where 'too big' and 'too small' is at, but I guess the answer is between 2 and register count + 1 members.
If I made a struct that holds 1 member that is an int, then clearly we should not pass the struct. (Not only is it inefficient, it also makes intention VERY murky... I suppose it has a use somewhere, but not common)
If I make a struct that holds two items, it might have value in clarity, as well as compliers might optimize it into two variables that travel as pairs. (risc-v specifies that a struct with two members returns both members in registers, assuming they are ints or smaller...)
If I make a structure that holds as many ints and double as there are in the registers for in the processor, it is TECHNICALLY a possible optimization.
The instance I surpass the register amounts though, it probably would have been worth it to keep the result struct in a pointer, and pass in only the parameters that were relevant. (That, and probably make the struct smaller and the function do less, because we have a LOT of registers on systems nowadays, even in the embedded world...)
对于小型结构(例如点、矩形),按值传递是完全可以接受的。 但是,除了速度之外,还有一个原因让您应该小心按值传递/返回大型结构:堆栈空间。
许多 C 编程是针对嵌入式系统的,其中内存非常宝贵,堆栈大小可能以 KB 甚至字节为单位...如果您按值传递或返回结构,这些结构的副本将被放置在堆栈,可能会导致此站点命名的情况...
如果我看到一个应用程序似乎有过多的堆栈使用,则结构按值传递是我首先寻找的东西之一。
For small structs (eg point, rect) passing by value is perfectly acceptable. But, apart from speed, there is one other reason why you should be careful passing/returning large structs by value: Stack space.
A lot of C programming is for embedded systems, where memory is at a premium, and stack sizes may be measured in KB or even Bytes... If you're passing or returning structs by value, copies of those structs will get placed on the stack, potentially causing the situation that this site is named after...
If I see an application that seems to have excessive stack usage, structs passed by value is one of the things I look for first.
不这样做的一个原因(尚未提及)是,这可能会导致二进制兼容性问题。
根据所使用的编译器,结构可以通过堆栈或寄存器传递,具体取决于编译器选项/实现
请参阅:http://gcc.gnu.org/onlinedocs/gcc/Code-Gen-Options.html
如果两个编译器不一致,事情可能会崩溃。 不用说,不这样做的主要原因是堆栈消耗和性能原因。
One reason not to do this which has not been mentioned is that this can cause an issue where binary compatibility matters.
Depending on the compiler used, structures can be passed via the stack or registers depending on compiler options/implementation
See: http://gcc.gnu.org/onlinedocs/gcc/Code-Gen-Options.html
If two compilers disagree, things can blow up. Needless to say the main reasons not to do this are illustrated are stack consumption and performance reasons.
要真正回答这个问题,需要深入挖掘汇编领域:(
以下示例在 x86_64 上使用 gcc。欢迎任何人添加其他架构,如 MSVC、ARM 等)
让我们来看看示例程序:
进行全面优化编译
查看程序集:
这就是我们得到的结果:
排除
nopl
pads,give_two_doubles()
有 27 个字节,而give_point( )
有 29 个字节。 另一方面,give_point()
比give_two_doubles()
生成的指令少一条。有趣的是,我们注意到编译器已经能够优化
mov
进入更快的 SSE2 变体movapd
和movsd. 此外,give_two_doubles()实际上将数据移入和移出内存,这使得速度变慢。
显然,其中大部分内容可能不适用于嵌入式环境(这是当今 C 语言的主要竞争环境)。 我不是组装向导,所以欢迎任何评论!
To really answer this question, one needs to dig deep into the assembly land:
(The following example uses gcc on x86_64. Anyone is welcome to add other architectures like MSVC, ARM, etc.)
Let's have our example program:
Compile it with full optimizations
Look at the assembly:
This is what we get:
Excluding the
nopl
pads,give_two_doubles()
has 27 bytes whilegive_point()
has 29 bytes. On the other hand,give_point()
yields one fewer instruction thangive_two_doubles()
What's interesting is that we notice the compiler has been able to optimize
mov
into the faster SSE2 variantsmovapd
andmovsd
. Furthermore,give_two_doubles()
actually moves data in and out from memory, which makes things slow.Apparently much of this may not be applicable in embedded environments (which is where the playing field for C is most of the time nowdays). I'm not an assembly wizard so any comments would be welcome!
到目前为止,这里的人们忘记提及的一件事(或者我忽略了它)是结构通常有一个填充!
每个 char 为 1 个字节,每个 Short 为 2 个字节。 该结构有多大? 不,这不是 6 个字节。 至少在任何更常用的系统上是这样。 在大多数系统上,它将是 8。问题是,对齐方式不是恒定的,它取决于系统,因此相同的结构在不同的系统上将具有不同的对齐方式和不同的大小。
填充不仅会进一步消耗您的堆栈,还会增加无法提前预测填充的不确定性,除非您知道系统如何填充,然后查看应用程序中的每个结构并计算大小为了它。 传递指针需要可预测的空间量——没有不确定性。 指针的大小对于系统来说是已知的,无论结构是什么样子,它总是相等的,并且指针大小总是以对齐且不需要填充的方式选择。
One thing people here have forgotten to mention so far (or I overlooked it) is that structs usually have a padding!
Every char is 1 byte, every short is 2 bytes. How large is the struct? Nope, it's not 6 bytes. At least not on any more commonly used systems. On most systems it will be 8. The problem is, the alignment is not constant, it's system dependent, so the same struct will have different alignment and different sizes on different systems.
Not only that padding will further eat up your stack, it also adds the uncertainty of not being able to predict the padding in advance, unless you know how your system pads and then look at every single struct you have in your app and calculate the size for it. Passing a pointer takes a predictable amount of space -- there is no uncertainty. The size of a pointer is known for the system, it is always equal, regardless of what the struct looks like and pointer sizes are always chosen in a way that they are aligned and need no padding.