将正在运行的Linux进程中的内存地址范围转换为目标文件中的符号?
以下是文件 /proc/self/smaps 的片段:
00af8000-00b14000 r-xp 00000000 fd:00 16417 /lib/ld-2.8.so
Size: 112 kB
Rss: 88 kB
Pss: 1 kB
Shared_Clean: 88 kB
Shared_Dirty: 0 kB
Private_Clean: 0 kB
Private_Dirty: 0 kB
Referenced: 88 kB
Swap: 0 kB
00b14000-00b15000 r--p 0001c000 fd:00 16417 /lib/ld-2.8.so
Size: 4 kB
Rss: 4 kB
Pss: 4 kB
Shared_Clean: 0 kB
Shared_Dirty: 0 kB
Private_Clean: 0 kB
Private_Dirty: 4 kB
Referenced: 4 kB
Swap: 0 kB
它显示此进程 (self) 链接到 /lib/ld-2.8.so 和映射到内存中的两个(许多)字节范围。
第一个 88kb 范围(22 个 4kb 页)是共享且干净,即尚未写入。 这大概就是代码。
第二个 4kb 范围(单个页面)不共享,并且它是脏的——进程已经写入它,因为它是从磁盘上的文件映射到内存的。 这大概就是数据吧。
但是该内存中有什么?
如何将内存范围00b14000-00b15000转换为有用的信息,例如声明大型静态结构的文件的行号?
该技术需要考虑预链接和地址空间随机化,例如来自 execshield,以及单独的调试符号。
(动机是识别也会创建脏内存的流行库并修复它们,例如通过声明结构 const)。
Here is a snippet of the file /proc/self/smaps:
00af8000-00b14000 r-xp 00000000 fd:00 16417 /lib/ld-2.8.so
Size: 112 kB
Rss: 88 kB
Pss: 1 kB
Shared_Clean: 88 kB
Shared_Dirty: 0 kB
Private_Clean: 0 kB
Private_Dirty: 0 kB
Referenced: 88 kB
Swap: 0 kB
00b14000-00b15000 r--p 0001c000 fd:00 16417 /lib/ld-2.8.so
Size: 4 kB
Rss: 4 kB
Pss: 4 kB
Shared_Clean: 0 kB
Shared_Dirty: 0 kB
Private_Clean: 0 kB
Private_Dirty: 4 kB
Referenced: 4 kB
Swap: 0 kB
It shows that this process (self) is linked to /lib/ld-2.8.so and two (of the many) byte ranges mapped into memory.
The first range of 88kb (22 4kb pages) is shared and clean, that is it has not been written to. This is probably code.
The second range of 4kb (a single page) is not shared and it is dirty -- the process has written to it since it was memory mapped from the file on disk. This is probably data.
But what is in that memory?
How do you convert the memory range 00b14000-00b15000 into useful information such as the line number of the file in which a large static structure is declared?
The technique will need to take account of prelinking and address space randomization, such as from execshield, and also separate debugging symbols.
(The motivation is to identify popular libraries which also create dirty memory and to fix them, for example by by declaring structures const).
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
smaps 的格式为:
[BOTTOM]-[TOP] [PERM] [FILE OFFSET]
b80e9000-b80ea000 rw-p 0001b000 08:05 605294 /lib/ld-2.8.90.so
所以有文件的实际内容'文件偏移量 0x0001b000 处的 /lib/ld-2.8.90.so' 映射到该程序内存中的 0xb80e9000。
要提取映射地址的行号或 C 代码,您需要将其与可执行文件或库文件的 ELF 部分相匹配,然后提取 GDB 符号(如果所述可执行文件或库仍然具有它们)。
GDB 文件格式(表面)记录在 http://sourceware.org/ gdb/current/onlinedocs/gdbint_7.html#SEC60
The format of smaps is:
[BOTTOM]-[TOP] [PERM] [FILE OFFSET]
b80e9000-b80ea000 rw-p 0001b000 08:05 605294 /lib/ld-2.8.90.so
So there the actual content of the file '/lib/ld-2.8.90.so' at file offset 0x0001b000 is mapped at 0xb80e9000 in that program's memory.
To extract the line number or C code of the mapped address you need to match it with the ELF section of the executable or library file and then extract the GDB symbols (if said executable or library still has them).
The GDB file formats are documented (superficially) at http://sourceware.org/gdb/current/onlinedocs/gdbint_7.html#SEC60
查看 ParaDyn 项目 (U. Wisc/U.马里兰州)。 它可以在许多平台上运行,并且不仅仅支持 ELF 文件(我相信它还支持 COFF 和其他一些文件)。 这里有文档。
具体来说,您可以看一下 AddressLookup 类; 我认为它正是你想要的。 还有一些工具(getLoadAddresses())用于找出在任何给定时间加载了哪些.so,我相信您还可以提取加载模块的代码部分的范围,这样您就可以知道内存的某些部分中有什么。
警告:我认为它将正确处理地址空间随机化,但我不完全确定。
Look at SymtabAPI from the ParaDyn project (U. Wisc/U. Maryland). It runs on a number of platforms, and supports more than just ELF files (I believe it also supports COFF and a few others). There's documentation here.
Specifically, you might take a look at the AddressLookup class; I think it does exactly what you want. There are also some facilities (getLoadAddresses()) for finding out what .so's are loaded at any given time, and I believe you can also extract the extent of the code sections of loaded modules, so you can tell what's in certain parts of memory.
Caveat: I think it will handle address space randomization properly, but I am not entirely sure.
您需要从 Linux 的内存处理程序中提取信息,以确定应用程序的虚拟内存映射与给定页面的关系。 如果您还想跟踪已交换出内存的页面中的信息,事情会变得更加棘手。
您将在此处找到一些信息,这些信息将帮助您入门。 进程表包含一些分页信息,但您可能必须浏览几个不同的区域才能获取您正在寻找的所有深层信息。
-亚当
You'll need to extract information from Linux's memory handler to determine how the application's virtual memory map relates to the pages given. It gets trickier if you also want to track information in pages that have been swapped out of memory.
You'll find some information here which will get you started. The process table includes some paging information, but you'll likely have to poke around to several different areas to get all the deep information you're looking for.
-Adam