为什么 C 和 C 使用 int 而不是 unsigned int++ for 循环?
这是一个相当愚蠢的问题,但是为什么在 C 或 C++ 中为数组定义 for 循环时通常使用 int
而不是 unsigned int
呢?
for(int i;i<arraySize;i++){}
for(unsigned int i;i<arraySize;i++){}
我认识到在执行数组索引以外的操作时使用 int
的好处以及在使用 C++ 容器时使用迭代器的好处。仅仅是因为循环数组时并不重要吗?或者我应该避免所有这些并使用不同的类型,例如 size_t
?
This is a rather silly question but why is int
commonly used instead of unsigned int
when defining a for loop for an array in C or C++?
for(int i;i<arraySize;i++){}
for(unsigned int i;i<arraySize;i++){}
I recognize the benefits of using int
when doing something other than array indexing and the benefits of an iterator when using C++ containers. Is it just because it does not matter when looping through an array? Or should I avoid it all together and use a different type such as size_t
?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(11)
从逻辑角度来看,使用 int 来索引数组更正确。
C 和 C++ 中的无符号语义并不真正意味着“非负数”,但它更像是“位掩码”或“模整数”。
要理解为什么
unsigned
对于“非负”数来说不是一个好的类型,请考虑这些完全荒谬的陈述:显然上述短语没有任何意义......但这就是 C 和 C++ <代码>无符号确实是语义作品。
实际上,使用无符号类型来表示容器的大小是 C++ 的一个设计错误,不幸的是我们现在注定要永远使用这个错误的选择(为了向后兼容)。您可能喜欢“无符号”这个名称,因为它与“非负”类似,但该名称无关紧要,重要的是语义......并且
unsigned
与“非负”相距甚远。因此,当对向量进行大多数循环编码时,我个人首选的形式是:(
当然假设向量的大小在迭代期间没有改变,并且我实际上需要主体中的索引,否则
for (auto& x : v)...
更好)。尽快摆脱
unsigned
并使用普通整数的优点是可以避免由于unsigned size_t
设计错误而导致的陷阱。例如,考虑一下:如果 pts 向量为空,上面的代码将会出现问题,因为在这种情况下 pts.size()-1 是一个巨大的无意义数字。处理
a
a
a
a
a
a
a
a
a < 的表达式b-1
与a+1
不同。 b
即使对于常用的值也就像在雷区跳舞一样。从历史上看,使用
size_t
无符号的理由是为了能够使用额外的位来表示值,例如,数组中能够有 65535 个元素,而不是 16 位平台上的 32767 个元素。在我看来,即使在那个时候,这种错误的语义选择所带来的额外成本也是不值得的(如果现在 32767 个元素还不够,那么 65535 个元素无论如何也不会足够长)。无符号值非常有用,但不能用于表示容器大小或索引;对于大小和索引,常规有符号整数效果更好,因为语义正是您所期望的。
当您需要模算术属性或想要在位级别工作时,无符号值是理想的类型。
Using
int
is more correct from a logical point of view for indexing an array.unsigned
semantic in C and C++ doesn't really mean "not negative" but it's more like "bitmask" or "modulo integer".To understand why
unsigned
is not a good type for a "non-negative" number please consider these totally absurd statements:Obviously none of the above phrases make any sense... but it's how C and C++
unsigned
semantic indeed works.Actually using an
unsigned
type for the size of containers is a design mistake of C++ and unfortunately we're now doomed to use this wrong choice forever (for backward compatibility). You may like the name "unsigned" because it's similar to "non-negative" but the name is irrelevant and what counts is the semantic... andunsigned
is very far from "non-negative".For this reason when coding most loops on vectors my personally preferred form is:
(of course assuming the size of the vector is not changing during the iteration and that I actually need the index in the body as otherwise the
for (auto& x : v)...
is better).This running away from
unsigned
as soon as possible and using plain integers has the advantage of avoiding the traps that are a consequence ofunsigned size_t
design mistake. For example consider:the code above will have problems if the
pts
vector is empty becausepts.size()-1
is a huge nonsense number in that case. Dealing with expressions wherea < b-1
is not the same asa+1 < b
even for commonly used values is like dancing in a minefield.Historically the justification for having
size_t
unsigned is for being able to use the extra bit for the values, e.g. being able to have 65535 elements in arrays instead of just 32767 on 16-bit platforms. In my opinion even at that time the extra cost of this wrong semantic choice was not worth the gain (and if 32767 elements are not enough now then 65535 won't be enough for long anyway).Unsigned values are great and very useful, but NOT for representing container size or for indexes; for size and index regular signed integers work much better because the semantic is what you would expect.
Unsigned values are the ideal type when you need the modulo arithmetic property or when you want to work at the bit level.
这是一个更普遍的现象,人们通常不使用正确的整数类型。现代 C 的语义 typedef 比原始整数类型更可取。例如,所有“尺寸”都应该输入为
size_t
。如果您系统地为应用程序变量使用语义类型,那么使用这些类型循环变量也会变得更加容易。我已经看到了一些难以检测的错误,这些错误来自使用
int
左右。代码突然在大型矩阵和类似的东西上崩溃了。只要使用正确的类型正确编码就可以避免这种情况。This is a more general phenomenon, often people don't use the correct types for their integers. Modern C has semantic typedefs that are much preferable over the primitive integer types. E.g everything that is a "size" should just be typed as
size_t
. If you use the semantic types systematically for your application variables, loop variables come much easier with these types, too.And I have seen several bugs that where difficult to detect that came from using
int
or so. Code that all of a sudden crashed on large matrixes and stuff like that. Just coding correctly with correct types avoids that.这纯粹是懒惰和无知。您应该始终使用正确的索引类型,除非您有进一步的信息来限制可能的索引范围,否则
size_t
是正确的类型。当然,如果维度是从文件中的单字节字段读取的,那么您就知道它的范围是 0-255,并且
int
将是一个完全合理的索引类型。同样,如果您循环固定次数(例如 0 到 99),则int
也可以。但是还有另一个不使用int
的原因:如果您使用 < code>i%2 在循环体中以不同方式处理偶数/奇数索引,当i
签名时i%2
比i%2
签名时要昂贵得多code>i 未签名...It's purely laziness and ignorance. You should always use the right types for indices, and unless you have further information that restricts the range of possible indices,
size_t
is the right type.Of course if the dimension was read from a single-byte field in a file, then you know it's in the range 0-255, and
int
would be a perfectly reasonable index type. Likewise,int
would be okay if you're looping a fixed number of times, like 0 to 99. But there's still another reason not to useint
: if you usei%2
in your loop body to treat even/odd indices differently,i%2
is a lot more expensive wheni
is signed than wheni
is unsigned...差别不大。
int
的好处之一是它可以被签名。因此 int i0
有意义,而unsigned i
0 没什么意义。
如果计算索引,这可能是有益的(例如,如果某些结果为负,您可能会遇到永远不会进入循环的情况)。
是的,写得更少:-)
Not much difference. One benefit of
int
is it being signed. Thusint i < 0
makes sense, whileunsigned i < 0
doesn't much.If indexes are calculated, that may be beneficial (for example, you might get cases where you will never enter a loop if some result is negative).
And yes, it is less to write :-)
使用
int
来索引数组是传统做法,但仍然被广泛采用。int
只是一个通用的数字类型,并不对应平台的寻址能力。如果它恰好比这个更短或更长,当尝试索引超出这个范围的非常大的数组时,您可能会遇到奇怪的结果。在现代平台上,
off_t
、ptrdiff_t
和size_t
保证了更多的可移植性。这些类型的另一个优点是它们为阅读代码的人提供了上下文。当您看到上述类型时,您知道代码将执行数组下标或指针算术,而不仅仅是任何计算。
因此,如果您想编写防弹、可移植且上下文相关的代码,您可以通过敲击几次键盘来完成。
GCC 甚至支持
typeof
扩展,使您不必在各处键入相同的类型名:然后,如果您更改
arraySize
的类型,i 的类型
自动更改。Using
int
to index an array is legacy, but still widely adopted.int
is just a generic number type and does not correspond to the addressing capabilities of the platform. In case it happens to be shorter or longer than that, you may encounter strange results when trying to index a very large array that goes beyond.On modern platforms,
off_t
,ptrdiff_t
andsize_t
guarantee much more portability.Another advantage of these types is that they give context to someone who reads the code. When you see the above types you know that the code will do array subscripting or pointer arithmetic, not just any calculation.
So, if you want to write bullet-proof, portable and context-sensible code, you can do it at the expense of a few keystrokes.
GCC even supports a
typeof
extension which relieves you from typing the same typename all over the place:Then, if you change the type of
arraySize
, the type ofi
changes automatically.这实际上取决于编码器。一些编码员更喜欢类型完美主义,因此他们会使用他们要比较的任何类型。例如,如果他们正在迭代 C 字符串,您可能会看到:
而如果他们只是执行某件事 10 次,您可能仍然会看到
int
:It really depends on the coder. Some coders prefer type perfectionism, so they'll use whatever type they're comparing against. For example, if they're iterating through a C string, you might see:
While if they're just doing something 10 times, you'll probably still see
int
:我使用
int
因为它需要更少的物理输入,但这并不重要 - 它们占用相同的空间,除非你的数组有几十亿个元素,否则如果你不使用 16 位编译器,我通常不使用。I use
int
cause it requires less physical typing and it doesn't matter - they take up the same amount of space, and unless your array has a few billion elements you won't overflow if you're not using a 16-bit compiler, which I'm usually not.因为除非您的数组大小大于 2GB 类型的
char
、4GB 类型的short
或 8GB 类型的int
等,变量是否有符号并不重要。那么,当你可以少打字时,为什么要多打字呢?
Because unless you have an array with size bigger than two gigabyts of type
char
, or 4 gigabytes of typeshort
or 8 gigabytes of typeint
etc, it doesn't really matter if the variable is signed or not.So, why type more when you can type less?
除了打字时间较短的问题之外,原因还在于它允许负数。
由于我们无法提前判断一个值是否可以为负数,因此大多数采用整数参数的函数都采用有符号变量。由于大多数函数都使用有符号整数,因此对于循环之类的事情使用有符号整数通常会减少工作量。否则,您有可能不得不添加一堆类型转换。
当我们转向 64 位平台时,有符号整数的无符号范围对于大多数用途来说应该足够了。在这些情况下,没有太多理由不使用有符号整数。
Aside from the issue that it's shorter to type, the reason is that it allows negative numbers.
Since we can't say in advance whether a value can ever be negative, most functions that take integer arguments take the signed variety. Since most functions use signed integers, it is often less work to use signed integers for things like loops. Otherwise, you have the potential of having to add a bunch of typecasts.
As we move to 64-bit platforms, the unsigned range of a signed integer should be more than enough for most purposes. In these cases, there's not much reason not to use a signed integer.
考虑以下简单示例:
如果
max
恰好是负值,例如 -1,则-1
将被视为UINT_MAX
(当比较两个具有相同等级但符号不同的整数,有符号的将被视为无符号的)。另一方面,以下代码不会出现此问题:给出负
max
输入,循环将被安全地跳过。Consider the following simple example:
If
max
happens to be a negative value, say -1, the-1
will be regarded asUINT_MAX
(when two integers with the sam rank but different sign-ness are compared, the signed one will be treated as an unsigned one). On the other hand, the following code would not have this issue:Give a negative
max
input, the loop will be safely skipped.在大多数情况下,使用带符号的
int
是一个错误,很容易导致潜在的错误以及未定义的行为。使用
size_t
匹配系统的字大小(64 位系统上为 64 位,32 位系统上为 32 位),始终允许循环的正确范围并最大限度地降低整数溢出的风险。int
建议旨在解决缺乏经验的程序员经常错误地编写 reversefor
循环的问题(当然,int
code> 可能不在循环的正确范围内):一般来说,有符号和无符号变量不应混合在一起,因此有时不可避免地使用
int
。但是,for
循环的正确类型通常是size_t
。有一篇关于带符号变量比无符号变量更好的误解的精彩讨论,您可以在 YouTube 上找到它(Signed Integers Thought Harmful)罗伯特·西科德)。
TL;DR;:有符号变量比无符号变量更危险并且需要更多代码(几乎在所有情况下都应该首选无符号变量,并且在逻辑上不期望负值时绝对应该首选)。
对于无符号变量,唯一关心的是溢出边界,它具有严格定义的行为(环绕)并使用明确定义的模块化数学。
这允许单个边缘情况测试捕获溢出,并且可以在执行数学运算之后执行该测试。
但是,对于有符号变量,溢出行为是未定义 (UB),并且负范围实际上大于正范围 - 添加边缘情况的情况必须在之前<进行测试和显式处理/strong> 可以执行数学运算。
即,
INT_MIN * -1
是多少? (预处理器会保护你,但没有它你就会陷入困境)。PS
至于 @6502 在他们的答案中提供的例子,整个事情又是一个试图走捷径和一个简单的缺少
if
语句的问题。当循环假设数组中至少有 2 个元素时,应事先测试此假设。 IE:
Using a signed
int
is - in most cases - a mistake that could easily result in potential bugs as well as undefined behavior.Using
size_t
matches the system's word size (64 bits on 64 bit systems and 32 bits on 32 bit systems), always allowing for the correct range for the loop and minimizing the risk of an integer overflow.The
int
recommendation comes to solve an issue where reversefor
loops were often written incorrectly by unexperienced programmers (of course,int
might not be in the correct range for the loop):In general, signed and unsigned variables shouldn't be mixed together, so at times using an
int
in unavoidable. However, the correct type for afor
loop is as a rulesize_t
.There's a nice talk about this misconception that signed variables are better than unsigned variables, you can find it on YouTube (Signed Integers Considered Harmful by Robert Seacord).
TL;DR;: Signed variables are more dangerous and require more code than unsigned variables (which should be preferred almost in all cases and definitely whenever negative values aren't logically expected).
With unsigned variables the only concern is the overflow boundary which has a strictly defined behavior (wrap-around) and uses clearly defined modular mathematics.
This allows a single edge case test to catch an overflow and that test can be performed after the mathematical operation was executed.
However, with signed variables the overflow behavior is undefined (UB) and the negative range is actually larger than the positive range - things that add edge cases that must be tested for and explicitly handled before the mathematical operation can be executed.
i.e., how much
INT_MIN * -1
? (the pre-processor will protect you, but without it you're in a jam).P.S.
As for the example offered by @6502 in their answer, the whole thing is again an issue of trying to cut corners and a simple missing
if
statement.When a loop assumes at least 2 elements in an array, this assumption should be tested beforehand. i.e.: