在 C/C++ 中初始化数组是一个好习惯吗?
我最近遇到一种情况,我需要比较两个文件(黄金文件和预期文件)以验证测试结果,即使写入两个文件的数据相同,文件也不匹配。
经过进一步调查,我发现有一个结构包含一些整数和一个 64 字节的 char 数组,并且在大多数情况下并非 char 数组的所有字节都被使用,并且数组中未使用的字段包含随机数据,并且导致了不匹配。
这让我问一个问题,在 C/C++ 中初始化数组是否是一种好的做法,就像在 Java 中那样?
I recently encountered a case where I need to compare two files (golden and expected) for verification of test results and even though the data written to both the files were same, the files does not match.
On further investigation, I found that there is a structure which contains some integers and a char array of 64 bytes, and not all the bytes of char array were getting used in most of the cases and unused fields from the array contain random data and that was causing the mismatch.
This brought me ask the question whether it is good practice to initialize the array in C/C++ as well, as it is done in Java?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(8)
在使用内存/变量之前对其进行初始化是一个很好的做法 - 未初始化的变量是错误的一大来源,通常很难追踪。
将所有数据写入文件格式时,初始化所有数据是一个非常好的主意:它使文件内容保持干净,因此更容易使用,如果有人错误地尝试“使用”未初始化的数据,则不太容易出现问题(请记住,这可能会导致错误)不仅仅是您自己的代码,将来会读取数据),并使文件更加可压缩。
在使用变量之前不初始化变量的唯一充分理由是在性能关键的情况下,其中初始化在技术上是“不必要的”并且会产生大量开销。但在大多数情况下,初始化变量不会造成重大损害(特别是如果它们仅在使用前立即声明),但会通过消除常见的错误源来节省大量开发时间。
It is good practice to initialise memory/variables before you use them - uninitialised variables are a big source of bugs that are often very hard to track down.
Initialising all the data is a very good idea when writing it to a file format: It keeps the file contents cleaner so they are easier to work with, less prone to problems if someone incorrectly tries to "use" the uninitialised data (remember it may not just be your own code that reads the data in future), and makes the files much more compressible.
The only good reason not to initialise variables before you use them is in performance-critical situations, where the initialisation is technically "unnecessary" and incurs a significant overhead. But in most cases initialising variables won't cause significant harm (especially if they are only declared immediately before they are used), but will save you a lot of development time by eliminating a common source of bugs.
在数组中使用未定义的值会导致未定义的行为。
因此,该程序可以自由地产生不同的结果。这可能意味着您的文件最终会略有不同,或者程序崩溃,或者程序格式化您的硬盘,或者程序导致恶魔从用户的鼻子里飞出来(http://catb.org/jargon/html/N/nasal-demons.html )
这并不意味着您需要定义创建数组时指定数组值,但必须确保在使用任何数组值之前对其进行初始化。当然,确保这一点的最简单方法是在创建数组时执行此操作。
不要忘记,对于大量的 POD 数组,有一个很好的速记方法可以将所有成员初始化为零
Using an undefined value in an array results in undefined behaviour.
Thus the program is free to produce differing results. This may mean your files end up slightly different, or that the program crashes, or the program formats your hard drive, or the program causes demons to fly out the users nose ( http://catb.org/jargon/html/N/nasal-demons.html )
This doesn't mean you need to define your array values when you create the array, but you must ensure you initialise any array value before you use it. Of course the simplest way to ensure this is to do this when you create the array.
Don't forget that for huge arrays of PODs there's a nice shorthand to initialise all members to zero
我强烈不同意这样的观点,即这样做是“消除错误的常见来源”或“不这样做会扰乱程序的正确性”。如果程序使用统一值,那么它就有错误并且不正确。初始化这些值并不能消除这个错误,因为它们在第一次使用时通常仍然没有预期的值。然而,当它们包含随机垃圾时,程序更有可能在每次尝试时以随机方式崩溃。始终具有相同的值可能会在崩溃时提供更具确定性的行为,并使调试更容易。
对于您的具体问题,在将未使用的部分写入文件之前覆盖它们也是一种良好的安全实践,因为它们可能包含您不希望写入的先前使用的内容,例如密码。
I strongly disagree with the given opinions that doing so is "eliminating a common source of bugs" or "not doing so will mess with your program's correctness". If the program works with unitialized values then it has a bug and is incorrect. Initializing the values does not eliminate this bug, because they often still do not have the expected values at the first use. However, when they contain random garbage, the program is more likely to crash in a random way at every try. Always having the same values may give a more deterministic behaviour in crashing and makes debugging easier.
For your specific question, it is also good security practice to overwrite unused parts before they are written to a file, because they may contain something from a previous use that you do not want to be written, like passwords.
如果您不初始化 C++ 数组中的值,则这些值可以是任何值,因此如果您想要可预测的结果,最好将它们归零。
但是,如果您像使用空终止字符串一样使用 char 数组,那么您应该能够使用正确的函数将其写入文件。
尽管在 C++ 中,使用更多 OOP 解决方案可能会更好。 IE向量、字符串等
If you don't initialize the values in a c++ array, then the values could be anything, so it would be good practice to zero them out if you want predictable results.
But if you use the char array like a null terminated string, then you should be able to write it to a file with the proper function.
Although in c++ it might be better to use a more OOP solution. I.E. vectors, strings, etc.
请记住,保持数组未初始化可能具有性能等优势。
从未初始化的数组读取只是不好的事情。让它们在身边而不从未初始化的地方读取就很好了。
此外,如果您的程序存在错误,导致它从数组中未初始化的位置读取,那么通过防御性地将所有数组初始化为已知值来“覆盖它”并不是解决错误的方法,只能在以后让它浮出水面。
Keep in mind that keeping arrays uninitialized may have advantages like performance.
It's only bad reading from uninitialized arrays. Having them around without ever reading from uninitialized places is fine.
Moreover if your program has bug that makes it read from uninitialized place in array, then "covering it up" by defensively initializing all array to known value is not the solution for bug, and can only make it surface later.
人们可以写一篇大文章来介绍人们可能遇到的两种风格之间的差异,即总是在声明变量时初始化变量的人和在必要时初始化变量的人。我与属于第一类的人分享一个大项目,而我现在肯定更属于第二类。
总是初始化变量会带来更多微妙的错误和问题,我将尝试解释原因,并记住我发现的案例。
第一个例子:
这是另一个人写的代码。这个函数是我们应用程序中最热门的函数(您想象一下三叉树中 500 000 000 个句子的文本索引,FIFO 堆栈用于处理递归,因为我们不想使用递归函数调用)。
这是他的典型编程风格,因为他系统地初始化了变量。该代码的问题是初始化的隐藏
memcpy
和结构的其他两个副本(顺便说一句,这不是对 gcc 有时奇怪的memcpy
的调用),所以我们有3份+项目最热门函数中的隐藏函数调用。将其重写为
“仅一份副本”(以及在运行它的 SPARC 上的补充好处,由于避免了对
memcpy
的调用,该函数是一个叶函数,并且不需要构建新的寄存器窗口)。所以该函数速度快了 4 倍。另一个问题是我发现了盎司,但不记得确切的位置(所以没有代码示例,抱歉)。声明时初始化的变量,但在循环中使用,并在有限状态自动机中使用
switch
。问题是初始化值不是自动机的状态之一,并且在某些极少数情况下自动机无法正常工作。通过删除初始化程序,编译器发出的警告清楚地表明该变量可以在正确初始化之前使用。那时修理自动机很容易。道德:防御性地初始化变量可能会抑制编译器发出非常有用的警告。
结论:明智地初始化变量。系统地做这件事无非就是追随货物崇拜(我的工作伙伴是你能想象到的最糟糕的货物崇拜者,他从不使用 goto,总是初始化变量,使用大量静态声明(你知道它更快(它是事实上,在 SPARC 64 位上甚至非常慢),使所有函数内联,即使它们有 500 行(当编译器不希望时使用 __attribute__((always_inline)))
One could write a big article on the difference between the two styles one can encounter, people who initialize variables always when declaring them and people who initialize them when necessary. I share a big project with someone who is in the first category and I am now definitly more of the second type.
Always initializing variables has brought more subtle bugs and problems than not and I will try to explain why, remembering the cases I found.
First example:
This was the code written by the other guy. This function is the hottest function in our application (you imagine a text index on 500 000 000 sentences in a ternary tree, the FIFO stack is used to handle the recursion as we do not want to use recursive function calls).
This was typical of his programming style because of his systematic initialization of variables. The problem with that code was the hidden
memcpy
of the initialization and the two other copies of the structures (which btw were not calls tomemcpy
gcc's strange sometimes), so we had 3 copies + a hidden function call in the hottest function of the project.Rewriting it to
Only one copy (and supplemental benefit on SPARC where it runs, the function is a leaf function thanks to the avoided call to
memcpy
and does not need to build a new register window). So the function was 4 times faster.Another problem I found ounce but do not remember where exactly (so no code example, sorry). A variable that was initialized when declared but it was used in a loop, with
switch
in a finite state automaton. The problem the initialization value was not one of the states of the automaton and in some extremly rare cases the automaton didn't work correctly. By removing the initializer, the warning the compiler emitted made it obvious that the variable could be used before it was properly initialized. Fixing the automaton was easy then.Morality: defensively initialising a variable may suppress a very usefull warning of the compiler.
Conclusion: Initialise your variables wisely. Doing it systematicaly is nothing more than following a cargo-cult (my buddy at work is the worse cargo-culter one can imagine, he never uses goto, always initialize a variable, use a lot of static declarations (it's faster ye know (it's in fact even really slow on SPARC 64bit), makes all functions
inline
even if they have 500 lines (using__attribute__((always_inline))
when the compiler does not want)首先,您应该初始化数组、变量等,如果不这样做就会扰乱程序的正确性。
其次,在这种特殊情况下,不初始化数组似乎不会影响原始程序的正确性。相反,用于比较文件的程序对用于判断文件是否以有意义的方式不同(由第一个程序定义的“有意义”)的文件格式了解不够。
我不会抱怨原始程序,而是修复比较程序以了解有关相关文件格式的更多信息。如果文件格式没有详细记录,那么您就有充分的理由抱怨。
First, you should initialize arrays, variables, etc. if not doing so will mess with your program's correctness.
Second, it appears that in this particular case, not initializing the array did not affect the correctness of the original program. Instead, the program meant to compare the files does not know enough about the file format used to tell if the files differ in a meaningful way ("meaningful" defined by the first program).
Instead of complaining about the original program, I would fix the comparison program to know more about the file format in question. If the file format isn't well documented then you've got a good reason to complain.
我想说,C++ 中的良好实践是使用 std::vector<> 。而不是数组。当然,这对于 C 来说是无效的。
I would say that the good practice in C++ is using a std::vector<> instead of an array. This is not valid for C, of course.