关于 putenv() 和 setenv() 的问题

发布于 2024-11-05 00:28:59 字数 636 浏览 1 评论 0原文

我一直在思考环境变量,并有一些问题/观察结果。

  • putenv(char *string);

    这个调用似乎存在致命缺陷。因为它不会复制传递的字符串,所以您无法使用本地调用它,并且不能保证堆分配的字符串不会被覆盖或意外删除。此外(尽管我还没有测试过),由于环境变量的一种用途是将值传递给子环境,如果子调用 exec*() 函数之一,这似乎毫无用处。我错了吗?

  • Linux 手册页表明 glibc 2.0-2.1.1 放弃了上述行为并开始复制字符串,但这导致了内存泄漏,这一点已在 glibc 2.1.2 中修复。我不清楚这个内存泄漏是什么或如何修复它。

  • setenv() 复制字符串,但我不知道它到底是如何工作的。环境空间是在进程加载时分配的,但它是固定的。这里有一些(任意的?)惯例在起作用吗?例如,在 env 字符串指针数组中分配比当前使用更多的槽,并根据需要向下移动空终止指针?新的(复制的)字符串的内存是否分配在环境本身的地址空间中,如果它太大而无法容纳,您只需获取 ENOMEM 即可?

  • 考虑到上述问题,是否有任何理由更喜欢 putenv() 而不是 setenv()

I have been thinking a little about environment variables and have a few questions/observations.

  • putenv(char *string);

    This call seems fatally flawed. Because it doesn't copy the passed string you can't call it with a local and there is no guarantee a heap allocated string won't be overwritten or accidentally deleted. Furthermore (though I haven't tested it), since one use of environment variables is to pass values to child's environment this seems useless if the child calls one of the exec*() functions. Am I wrong in that?

  • The Linux man page indicates that glibc 2.0-2.1.1 abandoned the above behavior and began copying the string but this led to a memory leak that was fixed in glibc 2.1.2. It's not clear to me what this memory leak was or how it was fixed.

  • setenv() copies the string but I don't know exactly how that works. Space for the environment is allocated when the process loads but it is fixed. Is there some (arbitrary?) convention at work here? For example, allocating more slots in the env string pointer array than currently used and moving the null terminating pointer down as needed? Is the memory for the new (copied) string allocated in the address space of the environment itself and if it is too big to fit you just get ENOMEM?

  • Considering the above issues, is there any reason to prefer putenv() over setenv()?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

娇纵 2024-11-12 00:28:59
  • [] putenv(char *string); [...] 调用似乎存在致命缺陷。

是的,它有致命的缺陷。 它被保留在 POSIX (1988) 中,因为那是现有技术。 setenv() 机制后来出现。 更正: POSIX 1990 标准在 §B.4.6.1 中规定“附加函数 putenv()< /em> 和 clearenv() 已被考虑但被拒绝”。 1997 年的 单一 Unix 规范 (SUS) 第 2 版列出了 putenv( ),但不是 setenv()unsetenv()。下一个修订版(2004)确实定义了 setenv() unsetenv()以及。

因为它不会复制传递的字符串,所以您无法使用本地调用它,并且不能保证堆分配的字符串不会被覆盖或意外删除。

您是对的,局部变量几乎总是传递给 putenv() 的糟糕选择 - 异常是模糊的,几乎不存在。如果字符串是在堆上分配的(使用 malloc() 等),则必须确保代码不会修改它。如果是的话,它同时也在修改环境。

此外(虽然我还没有测试过),由于环境变量的一种用途是将值传递给子环境,如果子进程调用 exec*() 函数之一,这似乎毫无用处。我这样说有错吗?

exec*() 函数创建环境的副本并将其传递给执行的进程。那里没有问题。

Linux 手册页表明 glibc 2.0-2.1.1 放弃了上述行为并开始复制字符串,但这导致了内存泄漏,这一点已在 glibc 2.1.2 中修复。我不清楚这个内存泄漏是什么或如何修复的。

出现内存泄漏的原因是,一旦您使用字符串调用了 putenv() ,您就无法出于任何目的再次使用该字符串,因为您无法判断它是否仍在使用,尽管您可以修改通过覆盖它来获取值(如果将名称更改为在环境中其他位置找到的环境变量的名称,则会产生不确定的结果)。因此,如果您已经分配了空间,那么当您再次更改变量时,经典的 putenv() 就会泄漏它。当 putenv() 开始复制数据时,分配的变量变为未引用,因为 putenv() 不再保留对参数的引用,但用户期望环境将引用它,所以内存被泄露了。我不确定修复是什么 - 我 3/4 预计它会恢复到旧的行为。

setenv() 复制字符串,但我不知道它到底是如何工作的。环境空间在进程加载时分配,但空间是固定的。

原有环境空间固定;当你开始修改它时,规则就会改变。即使使用 putenv(),原始环境也会被修改,并且可能会因添加新变量或将现有变量更改为具有更长的值而增长。

这里有一些(任意的?)惯例吗?例如,在 env 字符串指针数组中分配比当前使用更多的槽,并根据需要向下移动空终止指针?

这就是 setenv() 机制可能要做的事情。 (全局)变量environ 指向环境变量指针数组的开头。如果它一次指向一块内存,另一次指向不同的内存块,那么环境就会切换,就像那样。

新(复制的)字符串的内存是否在环境本身的地址空间中分配,如果它太大而无法容纳,您只需获取 ENOMEM 即可?

嗯,是的,你可以获得 ENOMEM,但你必须非常努力。如果环境变得太大,则可能无法正确执行其他程序 - 环境将被截断或执行操作将失败。

考虑到上述问题,是否有任何理由更喜欢 putenv() 而不是 setenv()?

  • 在新代码中使用setenv()
  • 更新旧代码以使用 setenv(),但不要将其作为首要任务。
  • 不要在新代码中使用 putenv()
  • [The] putenv(char *string); [...] call seems fatally flawed.

Yes, it is fatally flawed. It was preserved in POSIX (1988) because that was the prior art. The setenv() mechanism arrived later. Correction: The POSIX 1990 standard says in §B.4.6.1 "Additional functions putenv() and clearenv() were considered but rejected". The Single Unix Specification (SUS) version 2 from 1997 lists putenv() but not setenv() or unsetenv(). The next revision (2004) did define both setenv() and unsetenv() as well.

Because it doesn't copy the passed string you can't call it with a local and there is no guarantee a heap allocated string won't be overwritten or accidentally deleted.

You're correct that a local variable is almost invariably a bad choice to pass to putenv() — the exceptions are obscure to the point of almost not existing. If the string is allocated on the heap (with malloc() et al), you must ensure that your code does not modify it. If it does, it is modifying the environment at the same time.

Furthermore (though I haven't tested it), since one use of environment variables is to pass values to child's environment this seems useless if the child calls one of the exec*() functions. Am I wrong in that?

The exec*() functions make a copy of the environment and pass that to the executed process. There's no problem there.

The Linux man page indicates that glibc 2.0-2.1.1 abandoned the above behavior and began copying the string but this led to a memory leak that was fixed in glibc 2.1.2. It's not clear to me what this memory leak was or how it was fixed.

The memory leak arises because once you have called putenv() with a string, you cannot use that string again for any purpose because you can't tell whether it is still in use, though you could modify the value by overwriting it (with indeterminate results if you change the name to that of an environment variable found at another position in the environment). So, if you have allocated space, the classic putenv() leaks it if you change the variable again. When putenv() began to copy data, allocated variables became unreferenced because putenv() no longer kept a reference to the argument, but the user expected that the environment would be referencing it, so the memory was leaked. I'm not sure what the fix was — I would 3/4 expect it was to revert to the old behaviour.

setenv() copies the string but I don't know exactly how that works. Space for the environment is allocated when the process loads but it is fixed.

The original environment space is fixed; when you start modifying it, the rules change. Even with putenv(), the original environment is modified and could grow as a result of adding new variables, or as a result of changing existing variables to have longer values.

Is there some (arbitrary?) convention at work here? For example, allocating more slots in the env string pointer array than currently used and moving the null terminating pointer down as needed?

That is what the setenv() mechanism is likely to do. The (global) variable environ points to the start of the array of pointers to environment variables. If it points to one block of memory at one time and a different block at a different time, then the environment is switched, just like that.

Is the memory for the new (copied) string allocated in the address space of the environment itself and if it is too big to fit you just get ENOMEM?

Well, yes, you could get ENOMEM, but you'd have to be trying pretty hard. And if you grow the environment too large, you may be unable to exec other programs properly - either the environment will be truncated or the exec operation will fail.

Considering the above issues, is there any reason to prefer putenv() over setenv()?

  • Use setenv() in new code.
  • Update old code to use setenv(), but don't make it a top priority.
  • Do not use putenv() in new code.
冬天的雪花 2024-11-12 00:28:59

没有特殊的“环境”空间 - setenv 只是动态地为字符串分配空间(例如使用 malloc),就像您通常所做的那样。由于环境不包含任何关于其中每个字符串来自何处的指示,因此 setenvunsetenv 不可能释放可能已动态分配的任何空间之前对 setenv 的调用。

“因为它不会复制传递的字符串,所以您无法使用本地调用它,并且不能保证堆分配的字符串不会被覆盖或意外删除。” putenv 的目的是确保如果您有堆分配的字符串,则可以有意删除它。这就是基本原理文本的含义:“唯一可以添加到环境中且不允许内存泄漏的函数”。是的,您可以使用本地调用它,只需在从函数返回之前从环境中删除字符串(putenv("FOO=") 或 unsetenv)即可。

要点是,使用 putenv 使得从环境中删除字符串的过程完全确定。而 setenv 会在某些现有实现上修改环境中的现有字符串,如果新值较短(以避免总是泄漏内存),并且由于它在您调用 setenv 时创建了一个副本,因此您不在控制最初动态分配的字符串,因此在删除它时您无法释放它。

同时,setenv 本身(或 unsetenv)无法释放前一个字符串,因为即使忽略 putenv,该字符串也可能来自原始环境,而不是由先前的 setenv 调用分配。

(整个答案假设正确实现了 putenv,即不是您提到的 glibc 2.0-2.1.1 中的那个。)

There is no special "the environment" space - setenv just dynamically allocates space for the strings (with malloc for example) as you would do normally. Because the environment doesn't contain any indication of where each string in it came from, it is impossible for setenv or unsetenv to free any space which may have been dynamically allocated by previous calls to setenv.

"Because it doesn't copy the passed string you can't call it with a local and there is no guarantee a heap allocated string won't be overwritten or accidentally deleted." The purpose of putenv is to make sure that if you have a heap-allocated string it's possible to delete it on purpose. That's what the rationale text means by "the only function available to add to the environment without permitting memory leaks." And yes, you can call it with a local, just remove the string from the environment (putenv("FOO=") or unsetenv) before you return from the function.

The point is that using putenv makes the process of removing a string from the environment entirely deterministic. Whereas setenv will on some existing implementations modify an existing string in the environment if the new value is shorter (to avoid always leaking memory), and since it made a copy when you called setenv you're not in control of the originally dynamically allocated string so you can't free it when it's removed.

Meanwhile, setenv itself (or unsetenv) can't free the previous string, since - even ignoring putenv - the string may have come from the original environment instead of being allocated by a previous invocation of setenv.

(This whole answer assumes a correctly implemented putenv, i.e. not the one in glibc 2.0-2.1.1 you mentioned.)

合约呢 2024-11-12 00:28:59

阅读 setenv基本原理部分> Open Group 基本规范第 6 期的手册页。

putenvsetenv 都应该符合 POSIX 标准。如果您有包含 putenv 的代码,并且该代码运行良好,请不要管它。如果您正在开发新代码,您可能需要考虑 setenv

如果您想查看<的实现示例,请查看 glibc 源代码代码>setenv (stdlib/setenv.c) 或 putenv (stdlib/putenv.c)。

Read the RATIONALE section of the setenv man page from The Open Group Base Specifications Issue 6.

putenv and setenv are both supposed to be POSIX compliant. If you have code with putenv in it, and the code works well, leave it alone. If you are developing new code you may want to consider setenv.

Look at the glibc source code if you want to see an example of an implementation of setenv (stdlib/setenv.c) or putenv (stdlib/putenv.c).

面犯桃花 2024-11-12 00:28:59

此外(虽然我还没有测试过),由于环境变量的一种用途是将值传递给子环境,如果子进程调用 exec() 函数之一,这似乎毫无用处。我这样说有错吗?

这不是环境传递给孩子的方式。所有各种类型的 exec()(您可以在本手册的第 3 节中找到它们,因为它们是库函数)最终都会调用系统调用 execve()(您可以在其中找到它)。参见手册第 2 节)。参数是:

   int execve(const char *filename, char *const argv[], char *const envp[]);

环境变量向量是显式传递的(并且可能部分地根据 putenv()setenv() 调用的结果构造)。内核将它们复制到新进程的地址空间中。从历史上看,您的环境大小受到此副本可用空间的限制(类似于参数限制),但我不熟悉现代 Linux 内核的限制。

Furthermore (though I haven't tested it), since one use of environment variables is to pass values to child's environment this seems useless if the child calls one of the exec() functions. Am I wrong in that?

That's not how the environment is passed to the child. All of the various flavors of exec() (which you find in section 3 of the manual beause they are library functions) ultimately invoke the system call execve() (which you find in section 2 of the manual). The arguments are:

   int execve(const char *filename, char *const argv[], char *const envp[]);

The vector of environment variables is passed explicitly (and may be partly constructed from the results of your putenv() and setenv() calls). The kernel copies these into the address space of the new process. Historically there was a limit to the size of your environment derived from the space available for this copy (similar to the argument limit) but I'm not familiar with the restrictions on a modern Linux kernel.

时光病人 2024-11-12 00:28:59

我强烈建议不要使用这两个函数。只要您小心,并且只有代码的一部分负责修改环境,可以安全地使用并且不会泄漏,但是如果任何代码可能使用线程并可能读取环境(例如用于时区、区域设置、dns 配置等目的)。

我能想到的修改环境的唯一两个目的是在运行时更改时区,或者将修改后的环境传递给子进程。对于前者,您可能必须使用这些函数之一 (setenv/putenv),或者您可以手动遍历 environ 来更改它 (如果您担心其他线程可能会同时尝试读取环境,那么这可能会更安全)。对于后一种用途(子进程),请使用可让您指定自己的环境数组的 exec 系列函数之一,或者简单地破坏 environ (全局)或使用setenv/putenv 位于子进程中 fork 之后但 exec 之前,在这种情况下您不必关心内存泄漏或线程安全,因为没有其他线程,并且您将破坏地址空间并用新的进程映像替换它。

I would highly recommend against using either of these functions. Either can be used safely and without leaks, as long as you're careful and only one part of your code is responsible for modifying the environment, but it's hard to get right and dangerous if any code might be using threads and might read the environment (e.g. for timezone, locale, dns config, etc. purposes).

The only two purposes I can think of for modifying the environment are to change the timezone at runtime, or to pass a modified environment to child processes. For the former, you probably have to use one of these functions (setenv/putenv), or you could walk environ manually to change it (this might be safer if you're worried other threads could try to read the environment at the same time). For the latter use (child processes), use one of the exec-family functions that lets you specify your own environment array, or simply clobber environ (the global) or use setenv/putenv in the child process after fork but before exec, in which case you don't have to care about memory-leaks or thread-safety because there are no other threads and you're about to destroy your address space and replace it with a new process image.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文