打开的文件句柄太多
我正在开发一个巨大的遗留 Java 应用程序,其中有很多手写的东西,现在你可以让框架来处理。
我现在面临的问题是 Solaris 服务器上的文件句柄用完了。我想知道跟踪打开文件句柄的最佳方法是什么?在哪里查看以及什么会导致打开的文件句柄耗尽?
我无法在 Solaris 下调试该应用程序,只能在 Windows 开发环境中调试。分析Windows下打开的文件句柄是否合理?
I'm working on a huge legacy Java application, with a lot of handwritten stuff, which nowadays you'd let a framework handle.
The problem I'm facing right now is that we are running out of file handles on our Solaris Server. I'd like to know what's the best way to track open file handles? Where to look at and what can cause open file handles to run out?
I cannot debug the application under Solaris, only on my Windows development environment. Is is even reasonable to analyze the open file handles under Windows?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(12)
我发现跟踪未关闭文件句柄的一件好事是 FindBugs:
http://findbugs.sourceforge.net/
它检查很多事情,但最有用的事情之一是资源打开/关闭操作。它是一个在源代码上运行的静态分析程序,也可以作为 Eclipse 插件使用。
One good thing I've found for tracking down unclosed file handles is FindBugs:
http://findbugs.sourceforge.net/
It checks many things, but one of the most useful is resource open/close operations. It's a static analysis program that runs on your source code and it's also available as an eclipse plugin.
在 Windows 上,您可以使用进程资源管理器查看打开的文件句柄:
http://technet。 microsoft.com/en-us/sysinternals/bb896653.aspx
在 Solaris 上,您可以使用“lsof”来监视打开的文件句柄
On windows you can look at open file handles using process explorer:
http://technet.microsoft.com/en-us/sysinternals/bb896653.aspx
On Solaris you can use "lsof" to monitor the open file handles
当我需要测试 ic 计数时,这个小脚本可以帮助我关注打开文件的计数。
如果在 Linux 上使用,那么对于 Solaris,您应该修补它(可能是:))
您也可以尝试使用 jvm ontion -Xverify:none - 它应该禁用 jar 验证(如果大多数打开的文件是 jar...) 。
对于通过未关闭的 FileOutputStream 的泄漏,您可以使用 findbug (上面指导过)或尝试查找如何修补标准 java FileOutputStream/FileInputStream 的文章,您可以在其中看到谁打开了文件,并且忘记了关闭它们。不幸的是,现在找不到这篇文章,但这是现有的:)
还要考虑增加文件限制 - 对于最新的 *nix 内核,处理超过 1024 个 fd 不是问题。
This little script help me to keep eye on count of opened files when I need test ic count.
If was used on Linux, so for Solaris you should patch it (may be :) )
Also you can try to play with jvm ontion -Xverify:none - it should disable jar verification (if most of opened files is jars...).
For leaks through not closed FileOutputStream you can use findbug (mentored above) or try to find article how to patch standard java FileOutputStream/FileInputStream , where you can see, who open files, and forgot close them. Unfortunatly, can not find this article right now, but this is existing :)
Also think about increasing of filelimit - for up-to-date *nix kernels is not a problem handle more than 1024 fd.
回答问题的第二部分:
显然,打开了很多文件,然后不关闭它们。
最简单的情况是,对持有本机句柄的任何对象(例如,FileInputStream)的引用在关闭之前被丢弃,这意味着文件在对象最终确定之前保持打开状态。
另一种选择是将对象存储在某处并且不关闭。堆转储也许能够告诉您哪些内容滞留在何处(
jmap
和jhat
包含在 JDK 中,或者您可以使用jvisualvm
(如果您想要 GUI)。您可能有兴趣寻找拥有文件描述符
。To answer the second part of the question:
Opening a lot of files, obviously, and then not closing them.
The simplest scenario is that the references to whatever objects hold the native handles (e.g.,
FileInputStream
) are thrown away before being closed, which means the files remain open until the objects are finalized.The other option is that the objects are stored somewhere and not closed. A heap dump might be able to tell you what lingers where (
jmap
andjhat
are included in the JDK, or you can usejvisualvm
if you want a GUI). You're probably interested in looking for objects owningFileDescriptor
s.这对于您的情况可能不切实际,但是当我在打开数据库连接时遇到类似问题时,我所做的就是用我自己的功能覆盖“打开”功能。 (方便的是,我已经有了这个函数,因为我们已经编写了自己的连接池。)然后在我的函数中,我向记录打开的表添加了一个条目。我进行了堆栈跟踪调用并保存了调用者的身份以及调用的时间,但我忘记了其他内容。当连接被释放时,我删除了表条目。然后我有一个屏幕,我们可以在其中转储打开的条目列表。然后,您可以查看时间戳并轻松查看哪些连接已打开了不太可能的时间,以及哪些函数完成了这些打开。
由此,我们能够快速追踪到打开连接但未能关闭连接的几个函数。
如果您有很多打开的文件句柄,那么当您完成某处操作时,很可能无法关闭它们。你说你已经检查了正确的 try/finally 块,但我怀疑代码中的某个地方你要么错过了一个坏的块,要么你有一个函数可以处理但永远不会到达finally。我想也有可能您每次打开文件时都确实进行了正确的关闭,但同时打开了数百个文件。如果是这种情况,我不确定除了认真重新设计程序以操作更少的文件,或认真重新设计程序以对文件访问进行排队之外,您还能做什么。 (此时我添加了通常的“在不知道您的申请的详细信息等的情况下”)
This may not be practical in your case, but what I did once when I had a similar problem with open database connections was override the "open" function with my own. (Conveniently I already had this function because we had written our own connection pooling.) In my function I then added an entry to a table recording the open. I did a stack trace call and saved the identify of the caller, along with the time called and I forget what else. When the connection was released, I deleted the table entry. Then I had a screen where we could dump the list of open entries. You could then look at the time stamp and easily see which connections had been open for unlikely amounts of time, and which functions had done these opens.
From this we were able to quickly track down the couple of functions that were opening connections and failing to close them.
If you have lots of open file handles, the odds are that you're failing to close them when you're done somewhere. You say you've checked for proper try/finally blocks, but I'd suspect somewhere in the code you either missed a bad one, or you have a function that hands and never makes it to the finally. I suppose it's also possible that you really are doing proper closes every time you open a file, but you are opening hundreds of files simultaneously. If that's the case, I'm not sure what you can do other than a serious program redesign to manipulate fewer files, or a serious program redesign to queue your file accesses. (At this point I add the usual, "Without knowing the details of your application, etc.)
值得记住的是,开放套接字也会消耗 Unix 系统上的文件句柄。因此,很可能是数据库连接池泄漏(例如,打开的数据库连接未关闭并返回到池中)导致了此问题 - 当然,我之前已经看到过由连接池泄漏引起的此错误。
Its worth bearing in mind that open sockets also consume file handles on Unix systems. So it could very well be something like a database connection pool leak (e.g. open database connections not being closed and returned to the pool) that is leading to this issue - certainly I have seen this error before caused by a connection pool leak.
我首先要求我的系统管理员获取该进程的所有打开文件描述符的列表。不同的系统以不同的方式执行此操作:例如,Linux 有
/proc/PID/fd
目录。我记得 Solaris 有一个命令(也许pfiles?)可以做同样的事情——你的系统管理员应该知道它。但是,除非您看到对同一文件的大量引用,否则 fd 列表不会为您提供帮助。如果它是一个服务器进程,它可能会出于某种原因打开大量文件(和套接字)。解决该问题的唯一方法是调整打开文件的系统限制 - 您还可以使用 ulimit 检查每个用户的限制,但在大多数当前安装中等于系统限制。
I would start by asking my sysadmin to get a listing of all open file descriptors for the process. Different systems do this in different ways: Linux, for example, has the
/proc/PID/fd
directory. I recall that Solaris has a command (maybe pfiles?) that will do the same thing -- your sysadmin should know it.However, unless you see a lot of references to the same file, a fd list isn't going to help you. If it's a server process, it probably has lots of files (and sockets) open for a reason. The only way to resolve the problem is adjust the system limit on open files -- you can also check the per-user limit with ulimit, but in most current installations that equals the system limit.
我会仔细检查 Solaris 机器上的环境设置。我相信默认情况下 Solaris 只允许每个进程 256 个文件句柄。对于服务器应用程序,特别是如果它运行在专用服务器上,这个值非常低。图 50 或更多用于打开 JRE 和库 JAR 的描述符,然后每个传入请求和数据库查询至少有一个描述符,可能更多,您可以看到这如何不符合要求一个严肃的服务器。
查看
/etc/system
文件中的rlim_fd_cur
和rlim_fd_max
的值,看看您的系统设置了什么。然后考虑这是否合理(您可以使用 lsof 命令查看服务器运行时打开了多少个文件描述符,最好使用 -p [进程 ID] 参数。I would double-check the environment settings on your Solaris box. I believe that by default Solaris only allows 256 file handles per process. For a server application, especially if it's running on a dedicated server, this is very low. Figure 50 or more descriptors for opening JRE and library JARs, and then at least one descriptor for each incoming request and database query, probably more, and you can see how this just won't cut the mustard for a serious server.
Have a look at the
/etc/system
file, for the values ofrlim_fd_cur
andrlim_fd_max
, to see what your system has set. Then consider whether this is reasonable (you can see how many file descriptors are open while the server is running with thelsof
command, ideally with the -p [process ID] parameter.不是您问题的直接答案,但这些问题可能是由于遗留代码中错误地释放文件资源造成的。例如,如果您正在使用 FileOutputsStream 类,请确保在 finally 块中调用 close 方法,如下例所示:
Not a direct answer to your question but these problems could be the result of releasing file resources incorrectly in your legacy code. By example if you're working with FileOutputsStream classes make sure the close methods are called in a finally block as in this example:
它肯定可以给你一个想法。由于它是 Java,因此文件打开/关闭机制应该类似地实现(除非其中一个 JVM 实现不正确)。我建议在 Windows 上使用文件监视器。
It could certainly give you an idea. Since it's Java, the file open/close mechanics should be implemented similarly (unless one of the JVMs are implemented incorrectly). I would recommend using File Monitor on Windows.
Google 从系统内部搜索一个名为 filemon 的应用程序。
顺便说一句,为了追踪这一点,您可以使用诸如aspectj之类的东西来记录所有打开和关闭文件的调用并记录它们发生的位置。
Google for an app called filemon from system internals.
BTW, to track this down you may be able to use something like aspectj to log all calls that open and close files and log where they occur.
这是一种有助于查找未封闭资源的编码模式。它关闭资源并在日志中抱怨该问题。
将上面的 file.close() 调用包装在忽略错误的 try-catch 块中。
此外,Java 7 有一个新的“try-with-resource”功能,可以自动关闭资源。
This is a coding pattern that helps find unclosed resources. It closes the resources and also complains in the log about the problem.
Wrap the above file.close() calls in try-catch blocks that ignore errors.
Also, Java 7 has a new 'try-with-resource' feature that can auto-close resources.