scanf的缺点
我想知道scanf()
的缺点。
在许多网站上,我读到使用 scanf
可能会导致缓冲区溢出。这是什么原因呢? scanf
还有其他缺点吗?
I want to know the disadvantages of scanf()
.
In many sites, I have read that using scanf
might cause buffer overflows. What is the reason for this? Are there any other drawbacks with scanf
?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(9)
到目前为止,大多数答案似乎都集中在字符串缓冲区溢出问题上。实际上,可与 scanf 函数一起使用的格式说明符支持显式字段宽度设置,这限制了输入的最大大小并防止缓冲区溢出。这使得对 scanf 中存在字符串缓冲区溢出危险的流行指控几乎毫无根据。声称
scanf
在某种程度上类似于gets
在这方面是完全错误的。scanf
和gets
之间存在重大的质的区别:scanf
确实为用户提供了防止字符串缓冲区溢出的功能,而得到
没有。有人可能会说这些
scanf
功能很难使用,因为字段宽度必须嵌入到格式字符串中(无法通过可变参数传递它,因为它可以在>printf
)。这确实是事实。scanf
在这方面的设计确实相当糟糕。但尽管如此,任何声称 scanf 在字符串缓冲区溢出安全性方面无可救药地被破坏的说法都是完全虚假的,并且通常是由懒惰的程序员提出的。scanf
的真正问题具有完全不同的性质,尽管它也与溢出有关。当scanf
函数用于将数字的十进制表示形式转换为算术类型的值时,它不提供算术溢出保护。如果发生溢出,scanf
会产生未定义的行为。因此,在 C 标准库中执行转换的唯一正确方法是来自strto...
系列的函数。因此,综上所述,scanf 的问题在于很难(尽管可能)正确且安全地使用字符串缓冲区。并且不可能安全地用于算术输入。后者才是真正的问题。前者只是带来不便。
PS 以上内容旨在介绍整个
scanf
函数系列(还包括fscanf
和sscanf
)。特别是对于scanf
来说,明显的问题是使用严格格式的函数来读取潜在的交互式输入的想法是相当值得怀疑的。Most of the answers so far seem to focus on the string buffer overflow issue. In reality, the format specifiers that can be used with
scanf
functions support explicit field width setting, which limit the maximum size of the input and prevent buffer overflow. This renders the popular accusations of string-buffer overflow dangers present inscanf
virtually baseless. Claiming thatscanf
is somehow analogous togets
in the respect is completely incorrect. There's a major qualitative difference betweenscanf
andgets
:scanf
does provide the user with string-buffer-overflow-preventing features, whilegets
doesn't.One can argue that these
scanf
features are difficult to use, since the field width has to be embedded into format string (there's no way to pass it through a variadic argument, as it can be done inprintf
). That is actually true.scanf
is indeed rather poorly designed in that regard. But nevertheless any claims thatscanf
is somehow hopelessly broken with regard to string-buffer-overflow safety are completely bogus and usually made by lazy programmers.The real problem with
scanf
has a completely different nature, even though it is also about overflow. Whenscanf
function is used for converting decimal representations of numbers into values of arithmetic types, it provides no protection from arithmetic overflow. If overflow happens,scanf
produces undefined behavior. For this reason, the only proper way to perform the conversion in C standard library is functions fromstrto...
family.So, to summarize the above, the problem with
scanf
is that it is difficult (albeit possible) to use properly and safely with string buffers. And it is impossible to use safely for arithmetic input. The latter is the real problem. The former is just an inconvenience.P.S. The above in intended to be about the entire family of
scanf
functions (including alsofscanf
andsscanf
). Withscanf
specifically, the obvious issue is that the very idea of using a strictly-formatted function for reading potentially interactive input is rather questionable.scanf 的问题(至少)是:
%s
从用户那里获取字符串,这导致该字符串可能比你的缓冲区长,从而导致溢出。我非常喜欢使用
fgets
读取整行,这样您就可以限制读取的数据量。如果您有一个 1K 缓冲区,并且使用fgets
读入一行,您可以通过没有终止换行符(文件的最后一行没有换行符)来判断该行是否太长。尽管有换行符)。然后,您可以向用户投诉,或者为该行的其余部分分配更多空间(如果需要,可以连续分配,直到有足够的空间)。无论哪种情况,都不存在缓冲区溢出的风险。
读完该行后,您知道您已位于下一行,因此那里没有问题。然后,您可以
sscanf
您的字符串到您想要的内容,而无需保存和恢复文件指针以进行重新读取。这是我经常使用的一段代码,以确保在向用户询问信息时不会发生缓冲区溢出。
如果需要的话,可以很容易地调整它以使用标准输入以外的文件,并且您还可以让它分配自己的缓冲区(并不断增加它直到它足够大),然后再将其返回给调用者(尽管调用者将负责当然是为了释放它)。
并且,它的测试驱动程序:
最后,进行测试运行以展示它的实际效果:
The problems with scanf are (at a minimum):
%s
to get a string from the user, which leads to the possibility that the string may be longer than your buffer, causing overflow.I very much prefer using
fgets
to read whole lines in so that you can limit the amount of data read. If you've got a 1K buffer, and you read a line into it withfgets
you can tell if the line was too long by the fact there's no terminating newline character (last line of a file without a newline notwithstanding).Then you can complain to the user, or allocate more space for the rest of the line (continuously if necessary until you have enough space). In either case, there's no risk of buffer overflow.
Once you've read the line in, you know that you're positioned at the next line so there's no problem there. You can then
sscanf
your string to your heart's content without having to save and restore the file pointer for re-reading.Here's a snippet of code which I frequently use to ensure no buffer overflow when asking the user for information.
It could be easily adjusted to use a file other than standard input if necessary and you could also have it allocate its own buffer (and keep increasing it until it's big enough) before giving that back to the caller (although the caller would then be responsible for freeing it, of course).
And, a test driver for it:
Finally, a test run to show it in action:
来自 comp.lang.c FAQ: 为什么每个人都说不要使用 scanf?我应该用什么来代替?
From the comp.lang.c FAQ: Why does everyone say not to use scanf? What should I use instead?
让
scanf
来做你想做的事情是非常困难的。当然可以,但正如大家所说,像scanf("%s", buf);
和gets(buf);
一样危险。例如,paxdiablo 在其读取函数中所做的事情可以通过以下方式完成:
上面将读取一行,将前 10 个非换行符存储在
buf
中,然后丢弃所有内容,直到(并包括)换行符。因此,paxdiablo 的函数可以使用scanf
编写,如下所示:scanf
的其他问题之一是它在溢出时的行为。例如,当读取int
时:如果发生溢出,则无法安全地使用上述内容。即使对于第一种情况,使用
fgets
读取字符串也比使用scanf
更简单。It is very hard to get
scanf
to do the thing you want. Sure, you can, but things likescanf("%s", buf);
are as dangerous asgets(buf);
, as everyone has said.As an example, what paxdiablo is doing in his function to read can be done with something like:
The above will read a line, store the first 10 non-newline characters in
buf
, and then discard everything till (and including) a newline. So, paxdiablo's function could be written usingscanf
the following way:One of the other problems with
scanf
is its behavior in case of overflow. For example, when reading anint
:the above cannot be used safely in case of an overflow. Even for the first case, reading a string is much more simpler to do with
fgets
rather than withscanf
.是的你是对的。
scanf
系列(scanf
、sscanf
、fscanf
..etc)中存在重大安全漏洞,尤其是在读取时一个字符串,因为它们不考虑缓冲区(它们正在读取的)的长度。示例:
显然缓冲区
buf
可以容纳 MAX3
字符。但是 sscanf 会尝试将“abcdef”放入其中,从而导致缓冲区溢出。Yes, you are right. There is a major security flaw in
scanf
family(scanf
,sscanf
,fscanf
..etc) esp when reading a string, because they don't take the length of the buffer (into which they are reading) into account.Example:
clearly the the buffer
buf
can hold MAX3
char. But thesscanf
will try to put"abcdef"
into it causing buffer overflow.scanf 的优点是,一旦您学会了如何使用该工具(就像您在 C 语言中应该做的那样),它就有非常有用的用例。您可以学习如何使用
scanf
和朋友通过阅读和理解手册。如果您在没有严重理解问题的情况下无法读完该手册,这可能表明您不太了解 C。scanf
和朋友们遭受了不幸的设计选择,这使得在不阅读文档的情况下很难(有时甚至不可能)正确使用,正如其他答案所示。不幸的是,这种情况在整个 C 语言中都会发生,所以如果我建议不要使用scanf
那么我可能会建议不要使用 C。最大的缺点之一似乎纯粹是它在不熟悉;正如 C 语言的许多有用功能一样,我们在使用它之前应该充分了解它。关键是要认识到,与 C 的其余部分一样,它看起来简洁且惯用,但这可能会产生微妙的误导。这在 C 语言中很普遍;对于初学者来说,很容易编写他们认为有意义的代码,甚至最初可能对他们有用,但实际上没有意义,并且可能会发生灾难性的失败。
例如,外行通常期望
%s
委托会导致读取一行,虽然这看起来很直观,但不一定正确。将字段描述为一个单词更为合适。强烈建议您阅读每个功能的手册。如果不提及其缺乏安全性和缓冲区溢出风险,对这个问题的回应会是什么?正如我们已经介绍过的,C 不是一种安全语言,并且允许我们走捷径,可能会以牺牲正确性为代价进行优化,或者更可能是因为我们是懒惰的程序员。因此,当我们知道系统永远不会收到大于固定字节数的字符串时,我们就可以声明一个具有大小的数组并放弃边界检查。我真的不认为这是一个失败;这是一个选择。再次强烈建议您阅读手册,这将向我们揭示此选项。
懒惰的程序员并不是唯一被
scanf
刺痛的人。例如,尝试使用%d
读取float
或double
值的情况并不罕见。他们通常错误地认为实现会在幕后执行某种转换,这是有道理的,因为类似的转换发生在语言的其余部分,但这里的情况并非如此。正如我之前所说,scanf
和它的朋友(实际上还有 C 的其余部分)都是具有欺骗性的;它们看起来简洁且惯用,但事实并非如此。没有经验的程序员不会被迫考虑操作是否成功。假设当我们告诉
scanf
使用%d
读取和转换十进制数字序列时,用户输入了完全非数字的内容。我们拦截此类错误数据的唯一方法是检查返回值,我们多久检查一次返回值?与
fgets
非常相似,当scanf
和朋友无法读取他们被告知要读取的内容时,流将处于异常状态;fgets
,如果没有足够的空间来存储完整的行,则未读的行的其余部分可能会被错误地视为新行,但实际上它不是新行。scanf
和类似的情况,如上所述,转换失败,错误的数据在流中未被读取,并且可能被错误地视为不同字段的一部分。使用
scanf
及其朋友并不比使用fgets
更容易。如果我们在使用fgets
时通过查找'\n'
来检查是否成功,或者在使用scanf
时通过检查返回值来检查是否成功和朋友们,我们发现使用fgets
读取了不完整的行,或者使用scanf
读取了字段失败,然后我们面临着同样的现实:可能会丢弃输入(通常直到并包括下一个换行符)!呜呜呜!不幸的是,scanf 同时使以这种方式丢弃输入既困难(不直观)又容易(最少的击键)。面对丢弃用户输入的现实,有些人尝试了
,但没有意识到scanf("%*[^\n]%*c");
%*[^\n]
委托在遇到除换行符之外的任何内容时都会失败,因此换行符仍将保留在流中。稍作调整,通过分离两个格式委托,我们在这里看到了一些成功:
scanf("%*[^\n]"); getchar();
。尝试使用其他工具通过很少的击键来完成此操作;)The advantage of
scanf
is once you learn how use the tool, as you should always do in C, it has immensely useful usecases. You can learn how to usescanf
and friends by reading and understanding the manual. If you can't get through that manual without serious comprehension issues, this would probably indicate that you don't know C very well.scanf
and friends suffered from unfortunate design choices that rendered it difficult (and occasionally impossible) to use correctly without reading the documentation, as other answers have shown. This occurs throughout C, unfortunately, so if I were to advise against usingscanf
then I would probably advise against using C.One of the biggest disadvantages seems to be purely the reputation it's earned amongst the uninitiated; as with many useful features of C we should be well informed before we use it. The key is to realise that as with the rest of C, it seems succinct and idiomatic, but that can be subtly misleading. This is pervasive in C; it's easy for beginners to write code that they think makes sense and might even work for them initially, but doesn't make sense and can fail catastrophically.
For example, the uninitiated commonly expect that the
%s
delegate would cause a line to be read, and while that might seem intuitive it isn't necessarily true. It's more appropriate to describe the field read as a word. Reading the manual is strongly advised for every function.What would any response to this question be without mentioning its lack of safety and risk of buffer overflows? As we've already covered, C isn't a safe language, and will allow us to cut corners, possibly to apply an optimisation at the expense of correctness or more likely because we're lazy programmers. Thus, when we know the system will never receive a string larger than a fixed number of bytes, we're given the ability to declare an array that size and forego bounds checking. I don't really see this as a down-fall; it's an option. Again, reading the manual is strongly advised and would reveal this option to us.
Lazy programmers aren't the only ones stung by
scanf
. It's not uncommon to see people trying to readfloat
ordouble
values using%d
, for example. They're usually mistaken in believing that the implementation will perform some kind of conversion behind the scenes, which would make sense because similar conversions happen throughout the rest of the language, but that's not the case here. As I said earlier,scanf
and friends (and indeed the rest of C) are deceptive; they seem succinct and idiomatic but they aren't.Inexperienced programmers aren't forced to consider the success of the operation. Suppose the user enters something entirely non-numeric when we've told
scanf
to read and convert a sequence of decimal digits using%d
. The only way we can intercept such erroneous data is to check the return value, and how often do we bother checking the return value?Much like
fgets
, whenscanf
and friends fail to read what they're told to read, the stream will be left in an unusual state;fgets
, if there isn't sufficient space to store a complete line, then the remainder of the line left unread might be erroneously treated as though it's a new line when it isn't.scanf
and friends, a conversion failed as documented above, the erroneous data is left unread on the stream and might be erroneously treated as though it's part of a different field.It's no easier to use
scanf
and friends than to usefgets
. If we check for success by looking for a'\n'
when we're usingfgets
or by inspecting the return value when we usescanf
and friends, and we find that we've read an incomplete line usingfgets
or failed to read a field usingscanf
, then we're faced with the same reality: We're likely to discard input (usually up until and including the next newline)! Yuuuuuuck!Unfortunately,
scanf
both simultaneously makes it hard (non-intuitive) and easy (fewest keystrokes) to discard input in this way. Faced with this reality of discarding user input, some have tried, not realising that thescanf("%*[^\n]%*c");
%*[^\n]
delegate will fail when it encounters nothing but a newline, and hence the newline will still be left on the stream.A slight adaptation, by separating the two format delegates and we see some success here:
scanf("%*[^\n]"); getchar();
. Try doing that with so few keystrokes using some other tool ;)我在使用
*scanf()
系列时遇到的问题:printf()
不同,您不能将其作为scanf()
调用中的参数;它必须在转换说明符中进行硬编码。scanf("%d", &value);
将成功转换并将 12 分配给value
,留下“w4”卡在输入流中,导致将来的读取变得混乱。理想情况下,整个输入字符串都应该被拒绝,但是scanf()
并没有为您提供一个简单的机制来做到这一点。如果您知道您的输入始终是格式良好的,具有固定长度的字符串和不会溢出的数值,那么
scanf()
是一个很棒的工具。如果您正在处理交互式输入或不能保证格式良好的输入,请使用其他输入。Problems I have with the
*scanf()
family:printf()
, you can't make it an argument in thescanf()
call; it must be hardcoded in the conversion specifier.scanf("%d", &value);
will successfully convert and assign 12 tovalue
, leaving the "w4" stuck in the input stream to foul up a future read. Ideally the entire input string should be rejected, butscanf()
doesn't give you an easy mechanism to do that.If you know your input is always going to be well-formed with fixed-length strings and numerical values that don't flirt with overflow, then
scanf()
is a great tool. If you're dealing with interactive input or input that isn't guaranteed to be well-formed, then use something else.这里的许多答案讨论了使用
scanf("%s", buf)
的潜在溢出问题,但最新的 POSIX 规范或多或少通过提供m
解决了这个问题> 可在c
、s
和[
格式的格式说明符中使用的赋值分配字符。这将允许scanf
使用malloc
分配所需的尽可能多的内存(因此稍后必须使用free
释放它)。其使用示例:
请参阅此处。这种方法的缺点是它是 POSIX 规范中相对较新的补充,并且根本没有在 C 规范中指定,因此目前它仍然相当不可移植。
Many answers here discuss the potential overflow issues of using
scanf("%s", buf)
, but the latest POSIX specification more-or-less resolves this issue by providing anm
assignment-allocation character that can be used in format specifiers forc
,s
, and[
formats. This will allowscanf
to allocate as much memory as necessary withmalloc
(so it must be freed later withfree
).An example of its use:
See here. Disadvantages to this approach is that it is a relatively recent addition to the POSIX specification and it is not specified in the C specification at all, so it remains rather unportable for now.
类似 scanf 的函数有一个大问题 - 缺乏 any 类型安全性。也就是说,你可以编写这样的代码:
地狱,即使这是“很好”:
它比类似于 printf 的函数更糟糕,因为 scanf 需要一个指针,因此更有可能崩溃。
当然,有一些格式说明符检查器,但是,它们并不完美,而且它们不是语言或标准库的一部分。
There is one big problem with
scanf
-like functions - the lack of any type safety. That is, you can code this:Hell, even this is "fine":
It's worse than
printf
-like functions, becausescanf
expects a pointer, so crashes are more likely.Sure, there are some format-specifier checkers out there, but, those are not perfect and well, they are not part of the language or the standard library.