观察文件系统 read() I/O 的进程(和子进程)的最佳方式?

发布于 2024-08-05 21:41:38 字数 534 浏览 11 评论 0原文

我想开发一个像这样工作的命令行程序:

myprogram /c [some_executable_here]

它启动了用户指定的命令并“监视”进程(和任何子进程)的读取 I/O 以及该程序何时退出,打印已“读取”的文件列表(最终导致 read() 系统调用)。

我最初的实施操作系统是 Windows,但我也想在 Linux 上做同样的事情。

到目前为止,我见过的所有类似文件系统监视的 API 都是针对监视目录(或单个文件),而不是进程,所以我不确定最好的方法是什么。

编辑:我正在寻找代码示例,说明如何最终实现此功能(或至少指向我可以遵循的 API 的指针)以在 Windows 和 Linux 上执行此操作。

另外需要明确的是,它不能使用 OpendFilesView、procmon 或某些系统级工具中的 grep 字符串等方法,这些工具无法从执行的开始和结束时通过 ID(以及任何子进程)明确标识进程; IOW,不会涉及任何计时问题,也不会因搜索“foo.exe”并得到错误的结果而出现误报。

I would like to develop a command line program that worked like so:

myprogram /c [some_executable_here]

Which launched the command specified by the user and "watched" the process (and any sub-processes) for read I/O and when that program exits, print a listing of files that were "read" (ultimately resulted in a read() system call).

My initial OS for implementation is Windows, but I'd like to do the same kind of thing on Linux as well.

All the FileSystem watch-like APIs I've seen so far are geared towards watching directories (or individual files) though, and not processes, so I'm not sure what the best way to go about this is.

EDIT: I'm looking for code examples of how to ultimately implement this (or at least pointers to APIs that I could follow) to do this on Windows and Linux.

Also to be clear, it can't use a method like OpendFilesView, procmon or grepping strings from some system-level tool that can't definitively identify the process by ID (and any sub-processes) from the beginning and end of its execution; IOW there can't be any timing issues involved and possibility of a false positive by searching for "foo.exe" and getting the wrong one.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

伤感在游骋 2024-08-12 21:41:38

在 Linux 上,我肯定会使用 strace ——它是简单而强大。例如:

$ strace -o/tmp/blah -f -eopen,read bash -c "cat ciao.txt"

运行请求的命令(包括它生成的子进程,由于 -f),并且还留在 /tmp/blah 中(在本例中为 120 行)详细说明这些进程进行的所有打开和读取调用及其结果。

之后您确实需要进行一些处理,以便根据需要提取已成功读取的文件集;例如,使用 Python,您可以这样做:

import re

linere = re.compile(r'^(\d+)\s+(\w+)\(([^)]+)\)\s+\=\s*(.*)

这有点过于简单化(您需要观看一些其他系统调用,例如 dup &c),但是我希望能够显示所需工作的要点。在我的示例中,这会发出:

['/lib/libc.so.6', '/lib/libdl.so.2', '/lib/libncurses.so.5',
 '/proc/meminfo', '/proc/sys/kernel/ngroups_max',
 '/usr/share/locale/locale.alias', 'ciao.txt']

因此它也算作“读取”那些为获取动态库和c而完成的操作,而不仅仅是“数据文件”......在系统调用级别,几乎没有区别。我想如果您需要的话,您可以过滤掉非数据文件。

我发现 strace 对于此类目的非常方便,如果我被要求在 Windows 上完成相同的工作,我的第一次尝试将是使用 StraceNT -- 不是 100% 兼容,当然底层系统调用名称 &c 不同,但我想我可以在我的 Python 代码中考虑这些差异(准备和执行 strace 命令,并对结果进行后处理)。

不幸的是,据我所知,其他一些 Unix 系统只有在您是 root(超级用户)时才提供这种功能——例如,在 Mac OS X 上,您需要通过 sudo 才能执行 dtrace 和 dtruss 等跟踪实用程序;我不知道有什么方法可以直接将 strace 移植到 Mac,也不知道有没有其他方法可以在没有 root 权限的情况下执行此类任务。

) def main(): openfiles = dict() filesread = set() with open('/tmp/blah') as f: for line in f: mo = linere.match(line) if mo is None: print "Unmatched line %r" % line pid, command, args, results = mo.groups() if command == 'open': fn = args.split(',', 1)[0].strip('"') fd = results.split(' ', 1)[0] openfiles[fd] = fn elif command == 'read': if results != '0': fd = args.split(',', 1)[0] filesread.add(openfiles[fd]) else: print "Unknown command %r" % command print sorted(filesread)

这有点过于简单化(您需要观看一些其他系统调用,例如 dup &c),但是我希望能够显示所需工作的要点。在我的示例中,这会发出:

因此它也算作“读取”那些为获取动态库和c而完成的操作,而不仅仅是“数据文件”......在系统调用级别,几乎没有区别。我想如果您需要的话,您可以过滤掉非数据文件。

我发现 strace 对于此类目的非常方便,如果我被要求在 Windows 上完成相同的工作,我的第一次尝试将是使用 StraceNT -- 不是 100% 兼容,当然底层系统调用名称 &c 不同,但我想我可以在我的 Python 代码中考虑这些差异(准备和执行 strace 命令,并对结果进行后处理)。

不幸的是,据我所知,其他一些 Unix 系统只有在您是 root(超级用户)时才提供这种功能——例如,在 Mac OS X 上,您需要通过 sudo 才能执行 dtrace 和 dtruss 等跟踪实用程序;我不知道有什么方法可以直接将 strace 移植到 Mac,也不知道有没有其他方法可以在没有 root 权限的情况下执行此类任务。

On Linux, I'd definitely use strace -- it's simple and powerful. E.g.:

$ strace -o/tmp/blah -f -eopen,read bash -c "cat ciao.txt"

runs the requested command (including the subprocesses it spawns, due to -f) and also leaves in /tmp/blah (120 lines in my case for this example) detailing all the open and read calls made by these processes, and their results.

You do need a little processing afterwards to extract just the set of files that were successfully read, as you require; for example, with Python, you could do:

import re

linere = re.compile(r'^(\d+)\s+(\w+)\(([^)]+)\)\s+\=\s*(.*)

This is a bit oversimplified (you need to watch some other syscalls such as dup &c) but, I hope, shows the gist of the work needed. In my example, this emits:

['/lib/libc.so.6', '/lib/libdl.so.2', '/lib/libncurses.so.5',
 '/proc/meminfo', '/proc/sys/kernel/ngroups_max',
 '/usr/share/locale/locale.alias', 'ciao.txt']

so it also counts as "reads" those that are done to get dynamic libraries &c, not just "data files"... at syscall level, there's little difference. I imagine you could filter non-data files out, if that's what you need.

I find strace so handy for such purposes that, were I tasked to do the same job on Windows, my first try would be to go for StraceNT -- not 100% compatible, and of course the underlying syscall names &c differ, but I think I could account for these differences in my Python code (preparing and executing the strace command, and post-processing the results).

Unfortunately, some other Unix systems, to my knowledge, only offer this kind of facilities if you're root (super-user) -- e.g. on Mac OS X you need to go via sudo in order to execute such tracing utilities as dtrace and dtruss; I don't know of a straightforward port of strace to the Mac, nor other ways to perform such tasks without root privileges.

) def main(): openfiles = dict() filesread = set() with open('/tmp/blah') as f: for line in f: mo = linere.match(line) if mo is None: print "Unmatched line %r" % line pid, command, args, results = mo.groups() if command == 'open': fn = args.split(',', 1)[0].strip('"') fd = results.split(' ', 1)[0] openfiles[fd] = fn elif command == 'read': if results != '0': fd = args.split(',', 1)[0] filesread.add(openfiles[fd]) else: print "Unknown command %r" % command print sorted(filesread)

This is a bit oversimplified (you need to watch some other syscalls such as dup &c) but, I hope, shows the gist of the work needed. In my example, this emits:

so it also counts as "reads" those that are done to get dynamic libraries &c, not just "data files"... at syscall level, there's little difference. I imagine you could filter non-data files out, if that's what you need.

I find strace so handy for such purposes that, were I tasked to do the same job on Windows, my first try would be to go for StraceNT -- not 100% compatible, and of course the underlying syscall names &c differ, but I think I could account for these differences in my Python code (preparing and executing the strace command, and post-processing the results).

Unfortunately, some other Unix systems, to my knowledge, only offer this kind of facilities if you're root (super-user) -- e.g. on Mac OS X you need to go via sudo in order to execute such tracing utilities as dtrace and dtruss; I don't know of a straightforward port of strace to the Mac, nor other ways to perform such tasks without root privileges.

友欢 2024-08-12 21:41:38

尝试“进程监视器”(procmon.exe) 它允许指定过滤器(要监视的进程的名称)。然后它会列出所有文件以及对所述文件的操作。

在 Linux 上,尝试使用 lsof 获取当前快照,尝试使用 strace 进行持续监控。您必须使用 grep 过滤输出。

所有这些工具都会检查进程结构(即操作系统用于管理进程的数据结构)并枚举其中提到的句柄/文件描述符。这不是文件系统 API 的功能,而是进程管理 API 的功能。

[编辑] 请参阅此页面上的“它是如何工作的”部分开始使用在 Windows 上编写您自己的工具。

Try "Process Monitor" (procmon.exe) It allows to specify a filter (the name of the process to watch). It'll then list all the files and operations on said files.

On Linux, try lsof for a current snapshot and strace for a continuous monitoring. You'll have to filter the output with grep.

All these tools check the process structure (i.e. the data structure which the OS uses to manage a process) and enumerate the handles/file descriptors mentioned there. This is not a function of the filesystem API but the process management API.

[EDIT] See the section "How does it work" on this page to get started to write your own tool on Windows.

杀手六號 2024-08-12 21:41:38

添加了选项 -d (--watchfd) 2014年进入pv密切关注进入pid。

易于记忆,有助于调试。

pv --help
  -d, --watchfd PID[:FD]   watch file FD opened by process PID

例如,以他的名字观看进程。

pv -d `pgrep firefox`

An option -d (--watchfd) was added in 2014 into pv to watch closely into a pid.

Easy to remember and helpful for debugging.

pv --help
  -d, --watchfd PID[:FD]   watch file FD opened by process PID

As example, to watch a process by his name.

pv -d `pgrep firefox`
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文