如何分析文件 I/O?
我们的构建速度慢得令人恼火。 它是一个使用 Ant 构建的 Java 系统,我在 Windows XP 上运行我的系统。 根据硬件的不同,完成可能需要 5 到 15 分钟。
观察机器上的整体性能指标,以及将硬件差异与构建时间相关联,表明该过程受 I/O 限制。 它还表明该过程执行的读取操作比写入操作多得多。
但是,我还没有找到一种好方法来确定哪些文件正在被读取或写入以及读取或写入的次数。 我怀疑,随着我们的许多子项目和编译器的后续调用,构建会多次重新读取相同的常用库。
有哪些分析工具可以告诉我给定进程正在对哪些文件执行什么操作? 免费固然好,但不是必需的。
使用 进程监视器,按照 Jon Skeet 的建议, 我是能够证实我的怀疑:几乎所有磁盘活动都是读取和重新读取库,其中 JDK 的“rt.jar”副本和其他库位于列表顶部。 我无法制作足够大的 RAM 磁盘来容纳我使用的所有库,但是将“最热”库安装在 RAM 磁盘上可以将构建时间缩短约 40%; 显然,Windows 文件系统缓存做得不够好,尽管我已经告诉 Windows 对此进行优化。
我注意到的一件有趣的事情是 JAR 上的典型“读取”操作文件只有几十个字节; 通常有两个或三个,然后在文件中进一步跳过几千字节。 它似乎不适合批量读取。
我将在闪存驱动器上对我的所有第三方库进行更多测试,看看会产生什么效果。
Our build is annoyingly slow. It's a Java system built with Ant, and I'm running mine on Windows XP. Depending on the hardware, it can take between 5 to 15 minutes to complete.
Watching overall performance metrics on the machine, as well as correlating hardware differences with build times, indicates that the process is I/O bound. It also shows that the process does a lot more reading than writing.
However, I haven't found a good way to determine which files are being read or written, and how many times. My suspicion is that with our many subprojects and subsequent invocations of the compiler, the build is re-reading the same commonly used libraries many times.
What are some profiling tools that will tell me what a given process is doing with which files? Free is nice, but not essential.
Using Process Monitor, as suggested by Jon Skeet, I was able to confirm my suspicion: almost all of the disk activity was reading and re-reading of libraries, with the JDK's copies of "rt.jar" and other libraries at the top of the list. I can't make a RAM disk large enough to hold all the libraries I used, but mounting the "hottest" libraries on a RAM disk cut build times by about 40%; clearly, Windows file system caching isn't doing a good enough job, even though I've told Windows to optimize for that.
One interesting thing I noticed is that the typical 'read' operation on a JAR file is just a few dozen bytes; usually there are two or three of these, followed by a skip several kilobytes further on in the file. It appeared to be ill-suited to bulk reads.
I'm going to do more testing with all of my third-party libraries on a flash drive, and see what effect that has.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
如果您仅在 Windows 上需要它,SysInternals 进程监视器< /a> 应该向您展示您需要知道的一切。 您可以选择进程,然后查看每个操作的进行情况,并获取文件操作的摘要。
If you only need it for Windows, SysInternals Process Monitor should show you everything you need to know. You can select the process, then see each operation as it goes and get a summary of file operation as well.
虽老但好用:创建 RAM 磁盘并从那里编译文件。
An oldie but a goodie: create a RAM disk and compile your files from there.
当我仍然使用 Windows 时,我曾经通过将所有构建输出写入单独的分区(如果大小可能为 3 GB)并通过计划任务每周定期格式化一次来获得良好的结果,从而加快构建速度。 这只是构建输出,所以如果偶尔被单方面压扁也没关系。
但老实说,自从迁移到 Linux 后,我再也不用担心磁盘碎片了。
在 Linux 上尝试构建(至少一次)的另一个原因是,您可以运行 strace (查找对 open 的调用)以查看您的构建正在接触哪些文件。
Back when I still used Windows I used to get good results speeding my build up by having all build output written to a separate partition if maybe 3 GB in size, and periodically formatting that at night once a week via a scheduled task. It's just build output, so it doesn't matter if it gets unilaterally flattened occasionally.
But honestly, since moving to Linux, disk fragmentation is something I never worry about any more.
Another reason to try your build on Linux, at least once, is so that you can run strace (grepped for calls to open) to see what files your build is touching.
我曾经在 Windows 上使用 Ant 构建一个大型 Java Web 应用程序(JSP 前端),这需要 3 分钟以上的时间。 我擦除了计算机并安装了 Linux,突然间构建花了 18 秒。 这些都是真实的数字,尽管已经有大约 3 年的历史了。 我只能假设 Java 更喜欢 Linux 内存管理和线程模型而不是 Windows 等效模型,因为根据我的经验(尤其是 Eclipse),所有 Java 程序似乎在 Linux 下运行得更好。 当您大量读取未更改的文件(即可执行文件和库)时,Linux 似乎可以更好地防止从磁盘进行额外读取。 这可能是磁盘缓存或文件系统的属性,我不确定是哪一个。
Java 的一大优点是它是跨平台的,因此设置基于 Linux 的构建服务器实际上是您的一个选择。 作为一名 Linux 布道者,我当然更愿意看到您将开发环境切换到 Linux,但我知道很多人不想这样做(或者出于实际原因不能这样做)。
如果您甚至不愿意设置 Linux 构建服务器来看看它是否运行得更快,您至少可以尝试对 Windows 计算机的硬盘进行碎片整理。 这对我工作计算机上的 C++ 构建产生了巨大的影响。 尝试JkDefrag,它看起来比Windows自带的碎片整理程序好很多。
编辑:我认为我被否决了,因为我的回答没有解决所提出的确切问题。 然而,StackOverflow 的传统是帮助人们解决真正的问题,而不仅仅是治疗症状。 我不是那种每个问题的答案都是“使用 Linux”的人。 然而,在这种情况下,我在OP所询问的情况下获得了非常真实的、可测量的性能增益,所以我认为值得分享我的经验。
I used to build a massive Java webapp (JSP frontend) using Ant on Windows and it would take upwards of 3 minutes. I wiped my computer and installed Linux, and suddenly the builds took 18 seconds. Those are real numbers, albeit about 3 years old. I can only assume that Java prefers the Linux memory management and threading models to the Windows equivalents, as all Java programs appear to run better under Linux in my experience (especially Eclipse). Linux seems a lot better about preventing extra reads from the disk when you're doing a lot of reading of files that haven't changed (i.e. exectuables and libraries). This may be a property of the disk cache or the filesystem, I'm not sure which.
One of the great things about Java is that it's cross-platform, so setting up a Linux-based build server is actually an option for you. Being something of a Linux evangelist, I'd of course prefer to see you switch your dev environment to Linux, but I know that a lot of people don't want to do that (or can't for practical reasons).
If you're not willing to even set up a Linux build server to see if it runs faster, you could at least try defragmenting your Windows machine's hard drive. That makes a huge difference for C++ builds on my work computer. Try JkDefrag, which seems a lot better than the defragmenter that comes with Windows.
EDIT: I'd assume I got a downvote because my answer doesn't address the exact question asked. It is, however, in the tradition of StackOverflow to help people fix their real problem, not just treat the symptoms. I'm not one of those people for whom the answer to every question is "use linux". In this instance, however, I have very real, measured performance gains in exactly the situation the OP is asking about, so I thought it worth sharing my experiences.
实际上 FileMon 是比 ProcMon 更直接的工具。 一般来说,在对磁盘 I/O 进行性能分析时,请考虑以下两项:
一旦评估了磁盘 I/O 的性能,就上述情况而言,您的系统很容易识别瓶颈并采取纠正措施:获取更快的磁盘或更改代码(以更便宜的方式为准)。
Actually FileMon is a more direct tool than ProcMon. In general, when running performance analysis for disk I/O, consider the following two:
Once you evaluate the performance of your system in terms of the above, it is easy to identify the bottleneck and take corrective action: get faster disks or change your code (whichever works out cheaper).