关于 putenv() 和 setenv() 的问题
我一直在思考环境变量,并有一些问题/观察结果。
putenv(char *string);
这个调用似乎存在致命缺陷。因为它不会复制传递的字符串,所以您无法使用本地调用它,并且不能保证堆分配的字符串不会被覆盖或意外删除。此外(尽管我还没有测试过),由于环境变量的一种用途是将值传递给子环境,如果子调用 exec*() 函数之一,这似乎毫无用处。我错了吗?
Linux 手册页表明 glibc 2.0-2.1.1 放弃了上述行为并开始复制字符串,但这导致了内存泄漏,这一点已在 glibc 2.1.2 中修复。我不清楚这个内存泄漏是什么或如何修复它。
setenv()
复制字符串,但我不知道它到底是如何工作的。环境空间是在进程加载时分配的,但它是固定的。这里有一些(任意的?)惯例在起作用吗?例如,在 env 字符串指针数组中分配比当前使用更多的槽,并根据需要向下移动空终止指针?新的(复制的)字符串的内存是否分配在环境本身的地址空间中,如果它太大而无法容纳,您只需获取 ENOMEM 即可?考虑到上述问题,是否有任何理由更喜欢
putenv()
而不是setenv()
?
I have been thinking a little about environment variables and have a few questions/observations.
putenv(char *string);
This call seems fatally flawed. Because it doesn't copy the passed string you can't call it with a local and there is no guarantee a heap allocated string won't be overwritten or accidentally deleted. Furthermore (though I haven't tested it), since one use of environment variables is to pass values to child's environment this seems useless if the child calls one of the
exec*()
functions. Am I wrong in that?The Linux man page indicates that glibc 2.0-2.1.1 abandoned the above behavior and began copying the string but this led to a memory leak that was fixed in glibc 2.1.2. It's not clear to me what this memory leak was or how it was fixed.
setenv()
copies the string but I don't know exactly how that works. Space for the environment is allocated when the process loads but it is fixed. Is there some (arbitrary?) convention at work here? For example, allocating more slots in the env string pointer array than currently used and moving the null terminating pointer down as needed? Is the memory for the new (copied) string allocated in the address space of the environment itself and if it is too big to fit you just get ENOMEM?Considering the above issues, is there any reason to prefer
putenv()
oversetenv()
?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
是的,它有致命的缺陷。
它被保留在 POSIX (1988) 中,因为那是现有技术。更正: POSIX 1990 标准在 §B.4.6.1 中规定“附加函数 putenv()< /em> 和 clearenv() 已被考虑但被拒绝”。 1997 年的 单一 Unix 规范 (SUS) 第 2 版列出了setenv()
机制后来出现。putenv( )
,但不是setenv()
或unsetenv()
。下一个修订版(2004)确实定义了setenv()
和unsetenv()
以及。您是对的,局部变量几乎总是传递给 putenv() 的糟糕选择 - 异常是模糊的,几乎不存在。如果字符串是在堆上分配的(使用
malloc()
等),则必须确保代码不会修改它。如果是的话,它同时也在修改环境。exec*()
函数创建环境的副本并将其传递给执行的进程。那里没有问题。出现内存泄漏的原因是,一旦您使用字符串调用了
putenv()
,您就无法出于任何目的再次使用该字符串,因为您无法判断它是否仍在使用,尽管您可以修改通过覆盖它来获取值(如果将名称更改为在环境中其他位置找到的环境变量的名称,则会产生不确定的结果)。因此,如果您已经分配了空间,那么当您再次更改变量时,经典的 putenv() 就会泄漏它。当 putenv() 开始复制数据时,分配的变量变为未引用,因为 putenv() 不再保留对参数的引用,但用户期望环境将引用它,所以内存被泄露了。我不确定修复是什么 - 我 3/4 预计它会恢复到旧的行为。原有环境空间固定;当你开始修改它时,规则就会改变。即使使用 putenv(),原始环境也会被修改,并且可能会因添加新变量或将现有变量更改为具有更长的值而增长。
这就是
setenv()
机制可能要做的事情。 (全局)变量environ
指向环境变量指针数组的开头。如果它一次指向一块内存,另一次指向不同的内存块,那么环境就会切换,就像那样。嗯,是的,你可以获得 ENOMEM,但你必须非常努力。如果环境变得太大,则可能无法正确执行其他程序 - 环境将被截断或执行操作将失败。
setenv()
。setenv()
,但不要将其作为首要任务。putenv()
。Yes, it is fatally flawed.
It was preserved in POSIX (1988) because that was the prior art. TheCorrection: The POSIX 1990 standard says in §B.4.6.1 "Additional functions putenv() and clearenv() were considered but rejected". The Single Unix Specification (SUS) version 2 from 1997 listssetenv()
mechanism arrived later.putenv()
but notsetenv()
orunsetenv()
. The next revision (2004) did define bothsetenv()
andunsetenv()
as well.You're correct that a local variable is almost invariably a bad choice to pass to
putenv()
— the exceptions are obscure to the point of almost not existing. If the string is allocated on the heap (withmalloc()
et al), you must ensure that your code does not modify it. If it does, it is modifying the environment at the same time.The
exec*()
functions make a copy of the environment and pass that to the executed process. There's no problem there.The memory leak arises because once you have called
putenv()
with a string, you cannot use that string again for any purpose because you can't tell whether it is still in use, though you could modify the value by overwriting it (with indeterminate results if you change the name to that of an environment variable found at another position in the environment). So, if you have allocated space, the classicputenv()
leaks it if you change the variable again. Whenputenv()
began to copy data, allocated variables became unreferenced becauseputenv()
no longer kept a reference to the argument, but the user expected that the environment would be referencing it, so the memory was leaked. I'm not sure what the fix was — I would 3/4 expect it was to revert to the old behaviour.The original environment space is fixed; when you start modifying it, the rules change. Even with
putenv()
, the original environment is modified and could grow as a result of adding new variables, or as a result of changing existing variables to have longer values.That is what the
setenv()
mechanism is likely to do. The (global) variableenviron
points to the start of the array of pointers to environment variables. If it points to one block of memory at one time and a different block at a different time, then the environment is switched, just like that.Well, yes, you could get ENOMEM, but you'd have to be trying pretty hard. And if you grow the environment too large, you may be unable to exec other programs properly - either the environment will be truncated or the exec operation will fail.
setenv()
in new code.setenv()
, but don't make it a top priority.putenv()
in new code.没有特殊的“环境”空间 - setenv 只是动态地为字符串分配空间(例如使用
malloc
),就像您通常所做的那样。由于环境不包含任何关于其中每个字符串来自何处的指示,因此setenv
或unsetenv
不可能释放可能已动态分配的任何空间之前对 setenv 的调用。“因为它不会复制传递的字符串,所以您无法使用本地调用它,并且不能保证堆分配的字符串不会被覆盖或意外删除。” putenv 的目的是确保如果您有堆分配的字符串,则可以有意删除它。这就是基本原理文本的含义:“唯一可以添加到环境中且不允许内存泄漏的函数”。是的,您可以使用本地调用它,只需在从函数返回之前从环境中删除字符串(
putenv("FOO=")
或 unsetenv)即可。要点是,使用 putenv 使得从环境中删除字符串的过程完全确定。而 setenv 会在某些现有实现上修改环境中的现有字符串,如果新值较短(以避免总是泄漏内存),并且由于它在您调用 setenv 时创建了一个副本,因此您不在控制最初动态分配的字符串,因此在删除它时您无法释放它。
同时,setenv 本身(或 unsetenv)无法释放前一个字符串,因为即使忽略 putenv,该字符串也可能来自原始环境,而不是由先前的 setenv 调用分配。
(整个答案假设正确实现了 putenv,即不是您提到的 glibc 2.0-2.1.1 中的那个。)
There is no special "the environment" space - setenv just dynamically allocates space for the strings (with
malloc
for example) as you would do normally. Because the environment doesn't contain any indication of where each string in it came from, it is impossible forsetenv
orunsetenv
to free any space which may have been dynamically allocated by previous calls to setenv."Because it doesn't copy the passed string you can't call it with a local and there is no guarantee a heap allocated string won't be overwritten or accidentally deleted." The purpose of putenv is to make sure that if you have a heap-allocated string it's possible to delete it on purpose. That's what the rationale text means by "the only function available to add to the environment without permitting memory leaks." And yes, you can call it with a local, just remove the string from the environment (
putenv("FOO=")
or unsetenv) before you return from the function.The point is that using putenv makes the process of removing a string from the environment entirely deterministic. Whereas setenv will on some existing implementations modify an existing string in the environment if the new value is shorter (to avoid always leaking memory), and since it made a copy when you called setenv you're not in control of the originally dynamically allocated string so you can't free it when it's removed.
Meanwhile, setenv itself (or unsetenv) can't free the previous string, since - even ignoring putenv - the string may have come from the original environment instead of being allocated by a previous invocation of setenv.
(This whole answer assumes a correctly implemented putenv, i.e. not the one in glibc 2.0-2.1.1 you mentioned.)
阅读
setenv
基本原理部分> Open Group 基本规范第 6 期的手册页。putenv
和setenv
都应该符合 POSIX 标准。如果您有包含putenv
的代码,并且该代码运行良好,请不要管它。如果您正在开发新代码,您可能需要考虑setenv
。如果您想查看<的实现示例,请查看 glibc 源代码代码>setenv (
stdlib/setenv.c
) 或putenv
(stdlib/putenv.c
)。Read the RATIONALE section of the
setenv
man page from The Open Group Base Specifications Issue 6.putenv
andsetenv
are both supposed to be POSIX compliant. If you have code withputenv
in it, and the code works well, leave it alone. If you are developing new code you may want to considersetenv
.Look at the glibc source code if you want to see an example of an implementation of
setenv
(stdlib/setenv.c
) orputenv
(stdlib/putenv.c
).这不是环境传递给孩子的方式。所有各种类型的 exec()(您可以在本手册的第 3 节中找到它们,因为它们是库函数)最终都会调用系统调用 execve()(您可以在其中找到它)。参见手册第 2 节)。参数是:
环境变量向量是显式传递的(并且可能部分地根据
putenv()
和setenv()
调用的结果构造)。内核将它们复制到新进程的地址空间中。从历史上看,您的环境大小受到此副本可用空间的限制(类似于参数限制),但我不熟悉现代 Linux 内核的限制。That's not how the environment is passed to the child. All of the various flavors of
exec()
(which you find in section 3 of the manual beause they are library functions) ultimately invoke the system callexecve()
(which you find in section 2 of the manual). The arguments are:The vector of environment variables is passed explicitly (and may be partly constructed from the results of your
putenv()
andsetenv()
calls). The kernel copies these into the address space of the new process. Historically there was a limit to the size of your environment derived from the space available for this copy (similar to the argument limit) but I'm not familiar with the restrictions on a modern Linux kernel.我强烈建议不要使用这两个函数。只要您小心,并且只有代码的一部分负责修改环境,可以安全地使用并且不会泄漏,但是如果任何代码可能使用线程并可能读取环境(例如用于时区、区域设置、dns 配置等目的)。
我能想到的修改环境的唯一两个目的是在运行时更改时区,或者将修改后的环境传递给子进程。对于前者,您可能必须使用这些函数之一 (
setenv
/putenv
),或者您可以手动遍历environ
来更改它 (如果您担心其他线程可能会同时尝试读取环境,那么这可能会更安全)。对于后一种用途(子进程),请使用可让您指定自己的环境数组的exec
系列函数之一,或者简单地破坏environ
(全局)或使用setenv
/putenv
位于子进程中fork
之后但exec
之前,在这种情况下您不必关心内存泄漏或线程安全,因为没有其他线程,并且您将破坏地址空间并用新的进程映像替换它。I would highly recommend against using either of these functions. Either can be used safely and without leaks, as long as you're careful and only one part of your code is responsible for modifying the environment, but it's hard to get right and dangerous if any code might be using threads and might read the environment (e.g. for timezone, locale, dns config, etc. purposes).
The only two purposes I can think of for modifying the environment are to change the timezone at runtime, or to pass a modified environment to child processes. For the former, you probably have to use one of these functions (
setenv
/putenv
), or you could walkenviron
manually to change it (this might be safer if you're worried other threads could try to read the environment at the same time). For the latter use (child processes), use one of theexec
-family functions that lets you specify your own environment array, or simply clobberenviron
(the global) or usesetenv
/putenv
in the child process afterfork
but beforeexec
, in which case you don't have to care about memory-leaks or thread-safety because there are no other threads and you're about to destroy your address space and replace it with a new process image.