观察文件系统 read() I/O 的进程（和子进程）的最佳方式？

发布于 2024-08-05 21:41:38 字数 534 浏览 11 评论 0原文

我想开发一个像这样工作的命令行程序：

myprogram /c [some_executable_here]

它启动了用户指定的命令并“监视”进程（和任何子进程）的读取 I/O 以及该程序何时退出，打印已“读取”的文件列表（最终导致 read() 系统调用）。

我最初的实施操作系统是 Windows，但我也想在 Linux 上做同样的事情。

到目前为止，我见过的所有类似文件系统监视的 API 都是针对监视目录（或单个文件），而不是进程，所以我不确定最好的方法是什么。

编辑：我正在寻找代码示例，说明如何最终实现此功能（或至少指向我可以遵循的 API 的指针）以在 Windows 和 Linux 上执行此操作。

另外需要明确的是，它不能使用 OpendFilesView、procmon 或某些系统级工具中的 grep 字符串等方法，这些工具无法从执行的开始和结束时通过 ID（以及任何子进程）明确标识进程; IOW，不会涉及任何计时问题，也不会因搜索“foo.exe”并得到错误的结果而出现误报。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

伤感在游骋 2024-08-12 21:41:38

在 Linux 上，我肯定会使用 strace ——它是简单而强大。例如：

$ strace -o/tmp/blah -f -eopen,read bash -c "cat ciao.txt"

运行请求的命令（包括它生成的子进程，由于 -f），并且还留在 /tmp/blah 中（在本例中为 120 行）详细说明这些进程进行的所有打开和读取调用及其结果。

之后您确实需要进行一些处理，以便根据需要提取已成功读取的文件集；例如，使用 Python，您可以这样做：

import re

linere = re.compile(r'^(\d+)\s+(\w+)\(([^)]+)\)\s+\=\s*(.*)
这有点过于简单化（您需要观看一些其他系统调用，例如 dup &c），但是我希望能够显示所需工作的要点。在我的示例中，这会发出：
['/lib/libc.so.6', '/lib/libdl.so.2', '/lib/libncurses.so.5',
 '/proc/meminfo', '/proc/sys/kernel/ngroups_max',
 '/usr/share/locale/locale.alias', 'ciao.txt']

因此它也算作“读取”那些为获取动态库和c而完成的操作，而不仅仅是“数据文件”......在系统调用级别，几乎没有区别。我想如果您需要的话，您可以过滤掉非数据文件。
我发现 strace 对于此类目的非常方便，如果我被要求在 Windows 上完成相同的工作，我的第一次尝试将是使用 StraceNT -- 不是 100% 兼容，当然底层系统调用名称 &c 不同，但我想我可以在我的 Python 代码中考虑这些差异（准备和执行 strace 命令，并对结果进行后处理）。
不幸的是，据我所知，其他一些 Unix 系统只有在您是 root（超级用户）时才提供这种功能——例如，在 Mac OS X 上，您需要通过 sudo 才能执行 dtrace 和 dtruss 等跟踪实用程序；我不知道有什么方法可以直接将 strace 移植到 Mac，也不知道有没有其他方法可以在没有 root 权限的情况下执行此类任务。
)

def main():
  openfiles = dict()
  filesread = set()
  with open('/tmp/blah') as f:
    for line in f:
      mo = linere.match(line)
      if mo is None:
        print "Unmatched line %r" % line
      pid, command, args, results = mo.groups()
      if command == 'open':
        fn = args.split(',', 1)[0].strip('"')
        fd = results.split(' ', 1)[0]
        openfiles[fd] = fn
      elif command == 'read':
        if results != '0':
          fd = args.split(',', 1)[0]
          filesread.add(openfiles[fd])
      else:
        print "Unknown command %r" % command
  print sorted(filesread)

这有点过于简单化（您需要观看一些其他系统调用，例如 dup &c），但是我希望能够显示所需工作的要点。在我的示例中，这会发出：

因此它也算作“读取”那些为获取动态库和c而完成的操作，而不仅仅是“数据文件”......在系统调用级别，几乎没有区别。我想如果您需要的话，您可以过滤掉非数据文件。

我发现 strace 对于此类目的非常方便，如果我被要求在 Windows 上完成相同的工作，我的第一次尝试将是使用 StraceNT -- 不是 100% 兼容，当然底层系统调用名称 &c 不同，但我想我可以在我的 Python 代码中考虑这些差异（准备和执行 strace 命令，并对结果进行后处理）。

不幸的是，据我所知，其他一些 Unix 系统只有在您是 root（超级用户）时才提供这种功能——例如，在 Mac OS X 上，您需要通过 sudo 才能执行 dtrace 和 dtruss 等跟踪实用程序；我不知道有什么方法可以直接将 strace 移植到 Mac，也不知道有没有其他方法可以在没有 root 权限的情况下执行此类任务。

On Linux, I'd definitely use strace -- it's simple and powerful. E.g.:

$ strace -o/tmp/blah -f -eopen,read bash -c "cat ciao.txt"

runs the requested command (including the subprocesses it spawns, due to -f) and also leaves in /tmp/blah (120 lines in my case for this example) detailing all the open and read calls made by these processes, and their results.

You do need a little processing afterwards to extract just the set of files that were successfully read, as you require; for example, with Python, you could do:

import re

linere = re.compile(r'^(\d+)\s+(\w+)\(([^)]+)\)\s+\=\s*(.*)
This is a bit oversimplified (you need to watch some other syscalls such as dup &c) but, I hope, shows the gist of the work needed.  In my example, this emits:
['/lib/libc.so.6', '/lib/libdl.so.2', '/lib/libncurses.so.5',
 '/proc/meminfo', '/proc/sys/kernel/ngroups_max',
 '/usr/share/locale/locale.alias', 'ciao.txt']

so it also counts as "reads" those that are done to get dynamic libraries &c, not just "data files"... at syscall level, there's little difference. I imagine you could filter non-data files out, if that's what you need.
I find strace so handy for such purposes that, were I tasked to do the same job on Windows, my first try would be to go for StraceNT -- not 100% compatible, and of course the underlying syscall names &c differ, but I think I could account for these differences in my Python code (preparing and executing the strace command, and post-processing the results).
Unfortunately, some other Unix systems, to my knowledge, only offer this kind of facilities if you're root (super-user) -- e.g. on Mac OS X you need to go via sudo in order to execute such tracing utilities as dtrace and dtruss; I don't know of a straightforward port of strace to the Mac, nor other ways to perform such tasks without root privileges.
)

def main():
  openfiles = dict()
  filesread = set()
  with open('/tmp/blah') as f:
    for line in f:
      mo = linere.match(line)
      if mo is None:
        print "Unmatched line %r" % line
      pid, command, args, results = mo.groups()
      if command == 'open':
        fn = args.split(',', 1)[0].strip('"')
        fd = results.split(' ', 1)[0]
        openfiles[fd] = fn
      elif command == 'read':
        if results != '0':
          fd = args.split(',', 1)[0]
          filesread.add(openfiles[fd])
      else:
        print "Unknown command %r" % command
  print sorted(filesread)

This is a bit oversimplified (you need to watch some other syscalls such as dup &c) but, I hope, shows the gist of the work needed. In my example, this emits:

so it also counts as "reads" those that are done to get dynamic libraries &c, not just "data files"... at syscall level, there's little difference. I imagine you could filter non-data files out, if that's what you need.

I find strace so handy for such purposes that, were I tasked to do the same job on Windows, my first try would be to go for StraceNT -- not 100% compatible, and of course the underlying syscall names &c differ, but I think I could account for these differences in my Python code (preparing and executing the strace command, and post-processing the results).

Unfortunately, some other Unix systems, to my knowledge, only offer this kind of facilities if you're root (super-user) -- e.g. on Mac OS X you need to go via sudo in order to execute such tracing utilities as dtrace and dtruss; I don't know of a straightforward port of strace to the Mac, nor other ways to perform such tasks without root privileges.

回复收藏 0 原文

友欢 2024-08-12 21:41:38

尝试“进程监视器”(procmon.exe) 它允许指定过滤器（要监视的进程的名称）。然后它会列出所有文件以及对所述文件的操作。

在 Linux 上，尝试使用 lsof 获取当前快照，尝试使用 strace 进行持续监控。您必须使用 grep 过滤输出。

所有这些工具都会检查进程结构（即操作系统用于管理进程的数据结构）并枚举其中提到的句柄/文件描述符。这不是文件系统 API 的功能，而是进程管理 API 的功能。

[编辑] 请参阅此页面上的“它是如何工作的”部分开始使用在 Windows 上编写您自己的工具。

回复收藏 0 原文

杀手六號 2024-08-12 21:41:38

添加了选项 -d (--watchfd) 2014年进入pv密切关注进入pid。

易于记忆，有助于调试。

pv --help
  -d, --watchfd PID[:FD]   watch file FD opened by process PID

例如，以他的名字观看进程。

pv -d `pgrep firefox`

An option -d (--watchfd) was added in 2014 into pv to watch closely into a pid.

Easy to remember and helpful for debugging.

pv --help
  -d, --watchfd PID[:FD]   watch file FD opened by process PID

As example, to watch a process by his name.

pv -d `pgrep firefox`

回复收藏 0 原文

~没有更多了~

关于作者

时光礼记

暂无简介

文章

26 人气

关注发私信

友情链接

文江博客

观察文件系统 read() I/O 的进程（和子进程）的最佳方式？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（3）

关于作者

相关话题

热门标签

推荐作者

橘味果▽酱

倾听心声的旋律

十年九夏

魂牵梦绕锁你心扉

旧情勿念

断爱

友情链接

观察文件系统 read() I/O 的进程（和子进程）的最佳方式？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（3）

关于作者

相关话题

热门标签

推荐作者

橘味果▽酱

倾听心声的旋律

十年九夏

魂牵梦绕锁你心扉

旧情勿念

断爱

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。