C 标准库中的哪些函数通常会鼓励不良做法?

发布于 2024-10-10 01:23:40 字数 1435 浏览 11 评论 0原文

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(14

亣腦蒛氧 2024-10-17 01:23:40

哪些 C 标准库函数的使用不当/可能导致安全问题/代码缺陷/效率低下?

我要说的是显而易见的:

char *gets(char *s);

由于其显着的特殊性,根本不可能正确地使用它。

What C standard library functions are used inappropriately/in ways that may cause/lead to security problems/code defects/inefficiencies ?

I'm gonna go with the obvious :

char *gets(char *s);

With its remarkable particularity that it's simply impossible to use it appropriately.

西瑶 2024-10-17 01:23:40

strtok() 函数的一个常见陷阱是假设解析的字符串保持不变,而实际上它用 '\0' 替换了分隔符。

此外,通过对 strtok() 进行后续调用来使用 strtok(),直到整个字符串被标记化。一些库实现将 strtok() 的内部状态存储在全局变量中,如果同时从多个线程调用 strtok() ,这可能会引发一些令人讨厌的意外。

CERT C 安全编码标准列出您问过的许多陷阱。

A common pitfall with the strtok() function is to assume that the parsed string is left unchanged, while it actually replaces the separator character with '\0'.

Also, strtok() is used by making subsequent calls to it, until the entire string is tokenized. Some library implementations store strtok()'s internal status in a global variable, which may induce some nasty suprises, if strtok() is called from multiple threads at the same time.

The CERT C Secure Coding Standard lists many of these pitfalls you asked about.

冰之心 2024-10-17 01:23:40

在几乎所有情况下,不应使用 atoi()(这也适用于 atof()atol()atoll ())。

这是因为这些函数根本不检测超出范围的错误 - 标准只是简单地说“如果结果的值无法表示,则行为未定义。”。因此,唯一可以安全使用它们的时候是,如果您可以证明输入肯定在范围内(例如,如果您将长度为 4 或更少的字符串传递给 atoi(),它不能超出范围)。

相反,请使用 strtol() 系列函数之一。

In almost all cases, atoi() should not be used (this also applies to atof(), atol() and atoll()).

This is because these functions do not detect out-of-range errors at all - the standard simply says "If the value of the result cannot be represented, the behavior is undefined.". So the only time they can be safely used is if you can prove that the input will certainly be within range (for example, if you pass a string of length 4 or less to atoi(), it cannot be out of range).

Instead, use one of the strtol() family of functions.

君勿笑 2024-10-17 01:23:40

让我们将问题扩展到更广泛意义上的接口。

errno:

从技术上讲,甚至不清楚它是什么,变量,宏,隐式函数调用?在现代系统的实践中,它主要是一个转换为函数调用以具有线程特定错误状态的宏。这是邪恶的:

  • 因为它可能会导致开销
    调用者访问该值,检查“错误”(这可能只是一个异常事件),
  • 因为它甚至在某些地方强制调用者在进行库调用之前清除此“变量”,
  • 因为它实现了一个简单的错误
    通过设置库的全局状态返回。

即将推出的标准使 errno 的定义更加直接,但这些丑陋之处仍然存在

Let us extend the question to interfaces in a broader sense.

errno:

technically it is not even clear what it is, a variable, a macro, an implicit function call? In practice on modern systems it is mostly a macro that transforms into a function call to have a thread specific error state. It is evil:

  • because it may cause overhead for the
    caller to access the value, to check the "error" (which might just be an exceptional event)
  • because it even imposes at some places that the caller clears this "variable" before making a library call
  • because it implements a simple error
    return by setting a global state, of the library.

The forthcoming standard gets the definition of errno a bit more straight, but these uglinesses remain

善良天后 2024-10-17 01:23:40

经常有一个strtok_r。

对于realloc来说,如果需要使用旧的指针,那么使用另一个变量也不是那么难。如果您的程序因分配错误而失败,那么清理旧指针通常并不是真正必要的。

There is often a strtok_r.

For realloc, if you need to use the old pointer, it's not that hard to use another variable. If your program fails with an allocation error, then cleaning up the old pointer is often not really necessary.

凝望流年 2024-10-17 01:23:40

我会把 printfscanf 放在这个列表中相当靠前的位置。事实上,您必须使格式说明符完全正确,这使得这些函数使用起来很棘手并且非常容易出错。读取数据时也很难避免缓冲区溢出。此外,当善意的程序员将客户端指定的字符串指定为 printf 的第一个参数时,“printf 格式字符串漏洞”可能会导致无数的安全漏洞,但多年后却发现堆栈被破坏,安全性受到损害。

I would put printf and scanf pretty high up on this list. The fact that you have to get the formatting specifiers exactly correct makes these functions tricky to use and extremely easy to get wrong. It's also very hard to avoid buffer overruns when reading data out. Moreover, the "printf format string vulnerability" has probably caused countless security holes when well-intentioned programmers specify client-specified strings as the first argument to printf, only to find the stack smashed and security compromised many years down the line.

时光瘦了 2024-10-17 01:23:40

任何操作全局状态的函数,例如 gmtime()localtime()。这些函数根本无法在多线程中安全地使用。

编辑: rand() 与看起来的类别相同。至少无法保证线程安全,并且在我的 Linux 系统上,手册页警告说它是不可重入且非线程安全的。

Any of the functions that manipulate global state, like gmtime() or localtime(). These functions simply can't be used safely in multiple threads.

EDIT: rand() is in the same category it would seem. At least there are no guarantees of thread-safety, and on my Linux system the man page warns that it is non-reentrant and non-threadsafe.

GRAY°灰色天空 2024-10-17 01:23:40

我最讨厌的事情之一是 strtok(),因为它是不可重入的,因为它将正在处理的字符串分成几部分,在它隔离的每个标记的末尾插入 NUL。这样做的问题很多;令人痛苦的是,它经常被吹捧为问题的解决方案,但它本身往往也是一个问题。并非总是如此 - 它可以安全使用。但前提是你要小心。大多数函数也是如此,但 gets() 是一个明显的例外,它不能安全使用。

One of my bêtes noire is strtok(), because it is non-reentrant and because it hacks the string it is processing into pieces, inserting NUL at the end of each token it isolates. The problems with this are legion; it is distressingly often touted as a solution to a problem, but is as often a problem itself. Not always - it can be used safely. But only if you are careful. The same is true of most functions, with the notable exception of gets() which cannot be used safely.

空心空情空意 2024-10-17 01:23:40

关于 realloc 已经有一个答案,但我对此有不同的看法。很多时候,我看到人们在表示free时写成reallocmalloc - 换句话说,当它们的缓冲区充满垃圾时,需要在存储新数据之前更改大小。这当然会导致潜在的大量、缓存混乱的 memcpy 垃圾即将被覆盖。

如果正确地与不断增长的数据一起使用(以避免将对象增长到大小n的最坏情况O(n^2)性能的方式,即以几何方式增长缓冲区当你用完空间时呈线性增长),realloc 比简单地执行自己的新 mallocmemcpyfree 具有令人怀疑的好处循环。 realloc 可以避免在内部执行此操作的唯一方法是当您使用堆顶部的单个对象时。

如果您喜欢使用calloc对新对象进行零填充,很容易忘记realloc不会对新部分进行零填充。

最后,realloc 的一个更常见的用途是分配超出您需要的对象,然后将分配的对象大小调整为所需的大小。但这实际上对于按大小严格隔离块的实现来说可能是有害的(额外的分配和memcpy),并且在其他情况下可能会增加碎片(通过分割大的空闲块的一部分来存储新的小块)对象,而不是使用现有的小空闲块)。

我不确定我是否会说 realloc 鼓励不好的做法,但这是一个我会留意的函数。

There's already one answer about realloc, but I have a different take on it. A lot of time, I've seen people write realloc when they mean free; malloc - in other words, when they have a buffer full of trash that needs to change size before storing new data. This of course leads to potentially-large, cache-thrashing memcpy of trash that's about to be overwritten.

If used correctly with growing data (in a way that avoids worst-case O(n^2) performance for growing an object to size n, i.e. growing the buffer geometrically instead of linearly when you run out of space), realloc has doubtful benefit over simply doing your own new malloc, memcpy, and free cycle. The only way realloc can ever avoid doing this internally is when you're working with a single object at the top of the heap.

If you like to zero-fill new objects with calloc, it's easy to forget that realloc won't zero-fill the new part.

And finally, one more common use of realloc is to allocate more than you need, then resize the allocated object down to just the required size. But this can actually be harmful (additional allocation and memcpy) on implementations that strictly segregate chunks by size, and in other cases might increase fragmentation (by splitting off part of a large free chunk to store a new small object, instead of using an existing small free chunk).

I'm not sure if I'd say realloc encourages bad practice, but it's a function I'd watch out for.

野稚 2024-10-17 01:23:40

一般来说,malloc 系列怎么样?我见过的绝大多数大型、长期存在的程序都在各处使用动态内存分配,就好像它是免费的一样。当然,实时开发人员知道这是一个神话,不小心使用动态分配可能会导致灾难性的内存使用量激增和/或地址空间碎片直至内存耗尽。

在一些没有机器级指针的高级语言中,动态分配还不错,因为实现可以在程序的生命周期内移动对象并整理内存碎片,只要它可以保持对这些对象的引用最新。非传统的 C 实现也可以做到这一点,但计算出细节并不简单,它会在所有指针取消引用中产生非常大的成本,并使指针变得相当大,因此出于实际目的,这在 C 中是不可能的

。怀疑是,正确的解决方案通常是长期存在的程序像往常一样使用 malloc 执行小型例程分配,但以可以重构和使用的形式保留大型长期数据结构。定期更换以对抗碎片,或者作为包含许多结构的大型 malloc 块,这些结构构成应用程序中的单个大数据单元(如浏览器中的整个网页呈现),或者 on-具有固定大小内存缓存或内存映射文件的磁盘。

How about the malloc family in general? The vast majority of large, long-lived programs I've seen use dynamic memory allocation all over the place as if it were free. Of course real-time developers know this is a myth, and careless use of dynamic allocation can lead to catastrophic blow-up of memory usage and/or fragmentation of address space to the point of memory exhaustion.

In some higher-level languages without machine-level pointers, dynamic allocation is not so bad because the implementation can move objects and defragment memory during the program's lifetime, as long as it can keep references to these objects up-to-date. A non-conventional C implementation could do this too, but working out the details is non-trivial and it would incur a very significant cost in all pointer dereferences and make pointers rather large, so for practical purposes, it's not possible in C.

My suspicion is that the correct solution is usually for long-lived programs to perform their small routine allocations as usual with malloc, but to keep large, long-lived data structures in a form where they can be reconstructed and replaced periodically to fight fragmentation, or as large malloc blocks containing a number of structures that make up a single large unit of data in the application (like a whole web page presentation in a browser), or on-disk with a fixed-size in-memory cache or memory-mapped files.

岁月染过的梦 2024-10-17 01:23:40

从完全不同的角度来看,当存在 atan2() 时,我从未真正理解 atan() 的好处。不同之处在于 atan2() 接受两个参数,并返回 -π..+π 范围内任意位置的角度。此外,它还避免了除以零错误和精度损失错误(将非常小的数字除以非常大的数字,反之亦然)。相比之下,atan() 函数仅返回 -π/2..+π/2 范围内的值,并且您必须事先进行除法(我不记得有这样的情况) atan() 可以在没有除法的情况下使用,而不是简单地生成反正切表。当给定一个简单值时,提供 1.0 作为 atan2() 的除数并不会突破极限。

On a wholly different tack, I've never really understood the benefits of atan() when there is atan2(). The difference is that atan2() takes two arguments, and returns an angle anywhere in the range -π..+π. Further, it avoids divide by zero errors and loss of precision errors (dividing a very small number by a very large number, or vice versa). By contrast, the atan() function only returns a value in the range -π/2..+π/2, and you have to do the division beforehand (I don't recall a scenario where atan() could be used without there being a division, short of simply generating a table of arctangents). Providing 1.0 as the divisor for atan2() when given a simple value is not pushing the limits.

林空鹿饮溪 2024-10-17 01:23:40

另一个答案,因为这些并没有真正相关,rand

  • 它具有未指定的随机质量,
  • 它不是可重入的

Another answer, since these are not really related, rand:

  • it is of unspecified random quality
  • it is not re-entrant
债姬 2024-10-17 01:23:40

其中一些函数正在修改某些全局状态。 (在 Windows 中)此状态是每个线程共享的 - 您可能会得到意想不到的结果。例如,每个线程中第一次调用 rand 都会给出相同的结果,并且需要小心使其成为伪随机的,但具有确定性(用于调试目的)。

Some of this functions are modifying some global state. (In windows) this state is shared per single thread - you can get unexpected result. For example, the first call of rand in every thread will give the same result, and it requires some care to make it pseudorandom, but deterministic (for debug purposes).

掀纱窥君容 2024-10-17 01:23:40

basename()dirname() 不是线程安全的。

basename() and dirname() aren't threadsafe.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文