遗留 gcc 编译器问题
我们正在使用基于 gcc 2.6.0 的遗留编译器来交叉编译我们仍在使用的旧嵌入式处理器(是的,它自 1994 年以来仍在使用!)。 为该芯片进行 gcc 移植的工程师早已离开。 虽然我们也许能够从网络上的某个地方恢复 gcc 2.6.0 源代码,但该芯片的更改集已经 消失在企业历史的殿堂中。 直到最近,我们一直处于混乱状态,因为编译器仍在运行并生成了可用的可执行文件,但从 Linux 内核 2.6.25(以及 2.6.26)开始,它失败并显示消息 gcc: virtual memory exend
..即使不带参数或仅使用 -v
运行。 我已经使用 2.6.24 内核重新启动了我的开发系统(从 2.6.26 开始),编译器再次工作(使用 2.6.25 重新启动则不起作用)。
我们有一个系统保留在 2.6.24,只是为了构建该芯片,但感觉有点暴露,以防 Linux 世界发展到我们无法再重建可以运行的系统的地步。编译器(即我们的 2.6.24 系统死机了,我们无法让 2.6.24 在新系统上安装和运行,因为某些软件部分不再可用)。
有谁知道我们可以对更现代的安装做些什么来让这个遗留编译器运行?
编辑:
回答一些评论...
遗憾的是,我们芯片特有的源代码更改丢失了。 这种损失发生在两个主要公司重组和几个系统管理员(其中几个确实留下了烂摊子)的过程中。 我们现在使用配置控制,但是对于这个问题来说,关闭谷仓门已经太晚了。
使用虚拟机是一个好主意,也可能是我们最终要做的事情。 谢谢你的想法。
最后,我按照 ehemient 的建议尝试了 strace,发现最后一个系统调用是 brk(),它在新系统(2.6.26 内核)上返回错误,在旧系统(2.6.24 内核)上返回成功。 这表明我确实耗尽了虚拟内存,除了 tcsh“limit”在新旧系统上返回相同的值,并且 /proc/meminfo 显示新系统具有稍微更多的内存和更多的交换空间。 也许是碎片问题或程序加载位置的问题?
我做了一些进一步的研究,并在内核 2.6.25 中添加了“brk 随机化”,但是 CONFIG_COMPAT_BRK
据说默认情况下处于启用状态(这会禁用 brk 随机化)。
编辑:
好的,更多信息: 看来 brk 随机化确实是罪魁祸首,旧版 gcc 正在调用 brk() 来更改数据段的末尾,但现在失败了,导致旧版 gcc 报告“虚拟内存耗尽”。 有一些记录在案的方法可以禁用 brk 随机化:
sudo echo 0 > /proc/sys/kernel/randomize_va_space
sudo sysctl -w kernel.randomize_va_space=0
使用
setarch i386 -R tcsh
(或“-R -L”)启动新 shell
我已经尝试过它们,它们似乎确实有效果,因为 brk() 返回值与没有它们时不同(并且始终相同)(在内核 2.6.25 和2.6.26),但 brk() 仍然失败,因此旧版 gcc 仍然失败:-(。
此外,我设置了 vm.legacy_va_layout=1
和 vm.overcommit_memory=2
code> 没有任何变化,并且我已经使用 /etc/sysctl.conf 中保存的 vm.legacy_va_layout=1
和 kernel.randomize_va_space=0
设置重新启动,但仍然没有变化。
编辑:
在内核 2.6.26(和 2.6.25)上使用 kernel.randomize_va_space=0
会导致 strace Legacy 报告以下 brk() 调用-gcc
:
brk(0x80556d4) = 0x8056000
这表示 brk() 失败,但看起来失败是因为数据段已经超出了请求的范围。 使用 objdump,我可以看到数据段应该以 0x805518c 结束,而失败的 brk() 表明数据段当前以 0x8056000 结束:
Sections: Idx Name Size VMA LMA File off Algn 0 .interp 00000013 080480d4 080480d4 000000d4 2**0 CONTENTS, ALLOC, LOAD, READONLY, DATA 1 .hash 000001a0 080480e8 080480e8 000000e8 2**2 CONTENTS, ALLOC, LOAD, READONLY, DATA 2 .dynsym 00000410 08048288 08048288 00000288 2**2 CONTENTS, ALLOC, LOAD, READONLY, DATA 3 .dynstr 0000020e 08048698 08048698 00000698 2**0 CONTENTS, ALLOC, LOAD, READONLY, DATA 4 .rel.bss 00000038 080488a8 080488a8 000008a8 2**2 CONTENTS, ALLOC, LOAD, READONLY, DATA 5 .rel.plt 00000158 080488e0 080488e0 000008e0 2**2 CONTENTS, ALLOC, LOAD, READONLY, DATA 6 .init 00000008 08048a40 08048a40 00000a40 2**4 CONTENTS, ALLOC, LOAD, READONLY, CODE 7 .plt 000002c0 08048a48 08048a48 00000a48 2**2 CONTENTS, ALLOC, LOAD, READONLY, CODE 8 .text 000086cc 08048d10 08048d10 00000d10 2**4 CONTENTS, ALLOC, LOAD, READONLY, CODE 9 .fini 00000008 080513e0 080513e0 000093e0 2**4 CONTENTS, ALLOC, LOAD, READONLY, CODE 10 .rodata 000027d0 080513e8 080513e8 000093e8 2**0 CONTENTS, ALLOC, LOAD, READONLY, DATA 11 .data 000005d4 08054bb8 08054bb8 0000bbb8 2**2 CONTENTS, ALLOC, LOAD, DATA 12 .ctors 00000008 0805518c 0805518c 0000c18c 2**2 CONTENTS, ALLOC, LOAD, DATA 13 .dtors 00000008 08055194 08055194 0000c194 2**2 CONTENTS, ALLOC, LOAD, DATA 14 .got 000000b8 0805519c 0805519c 0000c19c 2**2 CONTENTS, ALLOC, LOAD, DATA 15 .dynamic 00000088 08055254 08055254 0000c254 2**2 CONTENTS, ALLOC, LOAD, DATA 16 .bss 000003b8 080552dc 080552dc 0000c2dc 2**3 ALLOC 17 .note 00000064 00000000 00000000 0000c2dc 2**0 CONTENTS, READONLY 18 .comment 00000062 00000000 00000000 0000c340 2**0 CONTENTS, READONLY SYMBOL TABLE: no symbols
编辑:
回应下面 ehemient 的评论:“对待 GCC 很奇怪作为没有源代码的二进制文件”!
因此,使用 strace、objdump、gdb 以及我对 386 汇编器和架构的有限理解,我将问题追溯到遗留代码中的第一个 malloc 调用。 旧版 gcc 调用 malloc,它返回 NULL,这会导致 stderr 上出现“虚拟内存耗尽”消息。 这个malloc位于libc.so.5中,它调用getenv 多次并最终调用 brk()...我想增加堆...这失败了。
由此我只能推测问题不仅仅是 brk 随机化,或者我没有完全禁用 brk 随机化,尽管 randomize_va_space=0 和 Legacy_va_layout=1 sysctl 设置。
We are using a legacy compiler, based on gcc 2.6.0, to cross compile for an old imbedded processor we are still using (yes, it is still in use since 1994!). The engineer that did the gcc port for this chip has long since moved on. Although we might be able to recover the gcc 2.6.0 source from somewhere on the web, the change set for this chip has
disappeared in the halls of corporate history. We have muddled along until recently as the compiler still ran and produced workable executables, but as of linux kernel 2.6.25 (and also 2.6.26) it fails with the message gcc: virtual memory exhausted
... even when run with no parameters or with only -v
. I have rebooted my development system (from 2.6.26) using the 2.6.24 kernel and the compiler works again (rebooting with 2.6.25 does not).
We have one system that we are keeping at 2.6.24 just for the purpose of doing builds for this chip, but are feeling a bit exposed in case the linux world moves on to the point that we cannot any longer rebuild a system that will run the compiler (i.e. our 2.6.24 system dies and we cannot get 2.6.24 to install and run on a new system because some of the software parts are no longer available).
Does anyone have any ideas for what we might be able to do to a more modern installation to get this legacy compiler to run?
Edit:
To answer some of the comments...
Sadly it is the source code changes that are specific to our chip that are lost. This loss occurred over two major company reorgs and several sysadmins (a couple of which really left a mess). We now use configuration control, but that is closing the barn door too late for this problem.
The use of a VM is a good idea, and may be what we end up doing. Thank you for that idea.
Finally, I tried strace as ephemient suggested and found that the last system call was brk() which returned an error on the new system (2.6.26 kernel) and returned success on the old system (2.6.24 kernel). This would indicate that I really am running out of virtual memory, except that tcsh "limit" returns the same values on old and new systems, and /proc/meminfo shows the new systems has slightly more memory and quite a bit more swap space. Maybe it is a problem of fragmentation or where the program is being loaded?
I did some further research and "brk randomization" was added in kernel 2.6.25, however CONFIG_COMPAT_BRK
is supposedly enabled by default (which disables brk randomization).
Edit:
OK, more info:
It really looks like brk randomization is the culprit, the legacy gcc is calling brk() to change the end of the data segment and that now fails, causing the legacy gcc to report "virtual memory exhausted". There are a few documented ways to disable brk randomization:
sudo echo 0 > /proc/sys/kernel/randomize_va_space
sudo sysctl -w kernel.randomize_va_space=0
starting a new shell with
setarch i386 -R tcsh
(or "-R -L")
I have tried them and they do seem to have an effect in that the brk() return value is different (and always the same) than without them (tried on both kernel 2.6.25 and 2.6.26), but the brk() still fails so the legacy gcc still fails :-(.
In addition I have set vm.legacy_va_layout=1
and vm.overcommit_memory=2
with no change, and I have rebooted with the vm.legacy_va_layout=1
and kernel.randomize_va_space=0
settings saved in /etc/sysctl.conf. Still no change.
Edit:
Using kernel.randomize_va_space=0
on kernel 2.6.26 (and 2.6.25) results in the following brk() call being reported by strace legacy-gcc
:
brk(0x80556d4) = 0x8056000
This indicates the brk() failed, but it looks like it failed because the the data segment already ends beyond what was requested. Using objdump, I can see the data segment should end at 0x805518c whereas the failed brk() indicates that the data segment currently ends at 0x8056000:
Sections: Idx Name Size VMA LMA File off Algn 0 .interp 00000013 080480d4 080480d4 000000d4 2**0 CONTENTS, ALLOC, LOAD, READONLY, DATA 1 .hash 000001a0 080480e8 080480e8 000000e8 2**2 CONTENTS, ALLOC, LOAD, READONLY, DATA 2 .dynsym 00000410 08048288 08048288 00000288 2**2 CONTENTS, ALLOC, LOAD, READONLY, DATA 3 .dynstr 0000020e 08048698 08048698 00000698 2**0 CONTENTS, ALLOC, LOAD, READONLY, DATA 4 .rel.bss 00000038 080488a8 080488a8 000008a8 2**2 CONTENTS, ALLOC, LOAD, READONLY, DATA 5 .rel.plt 00000158 080488e0 080488e0 000008e0 2**2 CONTENTS, ALLOC, LOAD, READONLY, DATA 6 .init 00000008 08048a40 08048a40 00000a40 2**4 CONTENTS, ALLOC, LOAD, READONLY, CODE 7 .plt 000002c0 08048a48 08048a48 00000a48 2**2 CONTENTS, ALLOC, LOAD, READONLY, CODE 8 .text 000086cc 08048d10 08048d10 00000d10 2**4 CONTENTS, ALLOC, LOAD, READONLY, CODE 9 .fini 00000008 080513e0 080513e0 000093e0 2**4 CONTENTS, ALLOC, LOAD, READONLY, CODE 10 .rodata 000027d0 080513e8 080513e8 000093e8 2**0 CONTENTS, ALLOC, LOAD, READONLY, DATA 11 .data 000005d4 08054bb8 08054bb8 0000bbb8 2**2 CONTENTS, ALLOC, LOAD, DATA 12 .ctors 00000008 0805518c 0805518c 0000c18c 2**2 CONTENTS, ALLOC, LOAD, DATA 13 .dtors 00000008 08055194 08055194 0000c194 2**2 CONTENTS, ALLOC, LOAD, DATA 14 .got 000000b8 0805519c 0805519c 0000c19c 2**2 CONTENTS, ALLOC, LOAD, DATA 15 .dynamic 00000088 08055254 08055254 0000c254 2**2 CONTENTS, ALLOC, LOAD, DATA 16 .bss 000003b8 080552dc 080552dc 0000c2dc 2**3 ALLOC 17 .note 00000064 00000000 00000000 0000c2dc 2**0 CONTENTS, READONLY 18 .comment 00000062 00000000 00000000 0000c340 2**0 CONTENTS, READONLY SYMBOL TABLE: no symbols
Edit:
To echo ephemient's comment below: "So strange to treat GCC as a binary without source"!
So, using strace, objdump, gdb and my limited understanding of 386 assembler and architecture I have traced the problem to the 1st malloc call in the legacy code. The legacy gcc calls malloc, which returns NULL, which results in the "virtual memory exhausted" message on stderr. This malloc is in libc.so.5, and it calls getenv
a bunch of times and ends up calling brk()... I guess to increase the heap... which fails.
From this I can only surmise that the problem is more than brk randomization, or I have not fully disabled brk randomization, despite the randomize_va_space=0 and legacy_va_layout=1 sysctl settings.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
将 linux + 旧的 gcc 安装到虚拟机上。
Install linux + the old gcc onto a virtual machine.
您有此自定义编译器的源代码吗? 如果您可以恢复 2.6.0 基线(这应该相对容易),那么 diff 和 patch 应该恢复您的更改集。
然后我建议使用该更改集来针对最新的 gcc 构建新版本。 然后将其置于配置控制之下。
抱歉,我不是故意要喊叫的。 只是三十年来我一直在说同样的话。
Do you have the sources for this custom compiler? If you can recover the 2.6.0 baseline (and that should be relatively easy), then diff and patch should recover your change set.
What's I'd then recommend is using that change set to build a new version against up to date gcc. AND THEN PUT IT UNDER CONFIGURATION CONTROL.
Sorry, don't mean to shout. It's just I've been saying the same thing for most of 30 years.
您可以
strace
gcc-2.6.0
可执行文件吗? 它可能会执行诸如读取/proc/$$/maps
之类的操作,并且当输出以微不足道的方式发生变化时会感到困惑。 最近注意到在 2.6.28 和 2.6.29 之间出现了类似的问题。如果是这样,您可以破解
/usr/src/linux/fs/proc/task_mmu.c
或类似内容来恢复旧的输出,或者设置一些$LD_PRELOAD
来伪造gcc
读取另一个文件。编辑
既然您提到了
brk
...CONFIG_COMPAT_BRK
使默认的kernel.randomize_va_space=1
而不是2
,但是除了堆(brk
)之外,它仍然随机化所有内容。如果您
echo 0 > ,看看您的问题是否消失。 /proc/sys/kernel/randomize_va_space
或sysctl kernel.randomize_va_space=0
(等效)。如果是这样,请将
kernel.randomize_va_space = 0
添加到/etc/sysctl.conf
或将norandmaps
添加到内核命令行(等效),并且再次快乐吧。Can you
strace
thegcc-2.6.0
executable? It may be doing something like reading/proc/$$/maps
, and getting confused when the output changes in insignificant ways. A similar problem was recently noticed between 2.6.28 and 2.6.29.If so, you can hack
/usr/src/linux/fs/proc/task_mmu.c
or thereabouts to restore the old output, or set up some$LD_PRELOAD
to fakegcc
into reading another file.Edit
Since you mentioned
brk
...CONFIG_COMPAT_BRK
makes the defaultkernel.randomize_va_space=1
instead of2
, but that still randomizes everything other than the heap (brk
).See if your problem goes away if you
echo 0 > /proc/sys/kernel/randomize_va_space
orsysctl kernel.randomize_va_space=0
(equivalent).If so, add
kernel.randomize_va_space = 0
to/etc/sysctl.conf
or addnorandmaps
to the kernel command line (equivalent), and be happy again.我遇到了这个并考虑了你的问题。 也许您可以找到一种方法来使用二进制文件将其移动到 ELF 格式? 或者可能是无关紧要的,但使用 objdump 可以为您提供更多信息。
你能看一下进程内存映射吗?
I came across this and thought about your problem. May be you can find a way to play with the binary to move it to ELF format ? Or may be it is irrelevant, but playing with objdump can provide you more information.
Can you have a look at the process memory map ?
所以我已经解决了一些问题......这不是一个完整的解决方案,但它确实解决了我在遗留 gcc 中遇到的原始问题。
在 .plt (过程链接表)中的每个 libc 调用上放置断点,我看到 malloc (在 libc.so.5 中)调用 getenv() 来获取:
所以我在网络上搜索了这些并发现 这 告诉
遗留的 gcc 可以工作了!!!
但不是免费的,它在失败之前到达了构建中的链接,因此我们拥有的遗留nld还发生了一些事情:-(它正在报告:
在/etc/sysctl.conf中我有:
它仍然有效相同的 if
但不是 if
有建议使用“ldd”查看共享库依赖关系:遗留 gcc 只需要 libc5,但遗留 nld 还需要 libg++.so.27、libstdc++.so.27、libm.so .5 显然有一个 libg++.so.27 的 libc5 版本(libg++27-altdev ??)
libc5-compat 怎么样?
所以,正如我所说,还没有回家……越来越近了。 我可能会发布一个关于 nld 问题的新问题。
编辑:
我原本打算避免“接受”这个答案,因为我仍然对相应的遗留链接器有问题,但为了至少在这个问题上得到一些结论,我正在重新考虑那个位置。
感谢 an0nym0usc0ward
编辑
下面是我学到的最后的东西,现在我将接受虚拟机解决方案,因为我无法以任何其他方式完全解决它(至少在为此分配的时间内)。
较新的内核有一个 CONFIG_COMPAT_BRK 构建标志以允许使用 libc5,因此大概用此标志构建一个新内核将解决该问题(查看内核 src,它看起来会,但我不能确定,因为我这样做了不遵循所有路径)。 还有另一种记录在案的方法允许在运行时(而不是在内核构建时)使用 libc5:sudo sysctl -w kernel.randomize_va_space=0。 然而,这确实
如果没有完成完整的工作,一些(大多数?)libc5 应用程序仍然会崩溃,例如我们的旧版编译器和链接器。 这似乎是由于新旧内核之间的对齐假设存在差异。 我已经修补了链接器二进制文件,使其认为它具有更大的 bss 部分,以便将 bss 的末尾带到页面边界,并且当 sysctl var kernel.randomize_va_space=0 时,这适用于较新的内核。 这对我来说不是一个令人满意的解决方案,因为我盲目地修补了一个关键的二进制可执行文件,并且即使在较新的内核上运行修补的链接器产生了与在较旧的内核上运行的原始链接器相同的输出,但这并不能证明一些其他链接器输入(即我们更改正在链接的程序)也会产生相同的结果。
So I have worked something out... it is not a complete solution, but it does get past the original problem I had with the legacy gcc.
Putting breakpoints on every libc call in the .plt (procedure linkage table) I see that malloc (in libc.so.5) calls getenv() to get:
So I web-searched these and found this which advised
then the legacy gcc WORKS!!!!
But not home free, it got up to the link in the build before failing, so there is something further going on with the legacy nld we have :-( It is reporting:
In /etc/sysctl.conf I have:
It still works the same if
but not if
There was a suggestion to use "ldd" to see the shared library dependencies: the legacy gcc only needs libc5, but the legacy nld also needs libg++.so.27, libstdc++.so.27, libm.so.5 and apparently there is a libc5 version of libg++.so.27 (libg++27-altdev ??)
and what about libc5-compat?
So, as I said, not yet home free... be getting closer. I'll probably post a new question about the nld problem.
Edit:
I was originally going to refrain from "Accepting" this answer since it I still have a problem with the corresponding legacy linker, but in order to get some finality on this question at least, I am rethinking that position.
Thank-you's go out to:
Edit
Below is the last stuff that I learned, and now I will accept the VM solution since I could not fully solve it any other way (at least in the time alloted for this).
The newer kernels have a CONFIG_COMPAT_BRK build flag to allow libc5 to be used, so presumably building a new kernel with this flag will fix the problem (and looking through the kernel src, it looks like it will, but I cant be sure since I did not follow all of the paths). There is also another documented way to allow libc5 use at runtime (rather than at kernel build time): sudo sysctl -w kernel.randomize_va_space=0. This, however does
not do a complete job and some (most?) libc5 apps will still break, e.g. our legacy compiler and linker. This seems to be due to a difference in alignment assumptions between the newer and older kernels. I have patched the linker binary to make it think it has a bigger bss section, in order to bring the end of the bss up to a page boundary, and this works on the newer kernel when the sysctl var kernel.randomize_va_space=0. This is NOT a satisfactory solution to me since I am blindly patching a critical binary executable, and even though running the patched linker on the newer kernel produced a bit-identical output to the original linker run on the older kernel, that does not prove that some other linker input (i.e. we change the program being linked) will also produce identical results.
难道你不能简单地制作一个光盘映像,以便在系统死机时可以重新安装吗? 或者制作一个虚拟机?
Could you not simply make a disc image that can be re-installed if the system dies? or make a VM?