尽管多线程 Web 应用程序有很高的限制,但仍用完映射文件的文件描述符
我有一个映射大量文件的应用程序。 3000+左右。它还使用大约 75 个工作线程。该应用程序是用 Java 和 C++ 混合编写的,Java 服务器代码通过 JNI 调用 C++。
尽管不可预测,但它经常会耗尽文件描述符。我已将 /etc/security/limits.conf 中的限制提高到:
* hard nofile 131072
/proc/sys/fs/file-max is 101752。该系统是运行 Ubuntu 8.04 LTS 且内核为 2.6.35.4 的 Linode VPS。
在某个点之后,代码的 Java 和 C++ 位的打开都会失败。 Netstat 没有显示大量打开的套接字(“netstat -n | wc -l”低于 500)。 lsof 或 /proc/{pid}/fd 中打开的文件数量约为预期的 2000-5000。
这让我在几周的时间里一直在抓救命稻草(不是一直,而是每次我开始收到事情进展的通知时都会闪现出恐惧和厌恶)。
还有其他一些松散的线程让我想知道它们是否提供了任何见解:
由于该进程大约有 75 个线程,如果 mmaped 文件以某种方式占用每个线程一个文件描述符,那么数字就会加起来。也就是说,对 /proc/{pid}/tasks/*/fd 中的内容进行递归计数当前列出了 215575 个 fd,因此看起来它应该已经达到了限制,但事实并非如此,所以这似乎不太可能。 /p>
Apache + Passenger 也在同一个机器上运行,并且在最大数量的文件描述符方面位居第二,但即使有子进程,这些进程的描述符数量也不会超过 10k。
我不确定从那里去哪里。显然,有些东西使应用程序达到了极限,但我完全不知道下一步要检查什么。有什么想法吗?
I have an application that mmaps a large number of files. 3000+ or so. It also uses about 75 worker threads. The application is written in a mix of Java and C++, with the Java server code calling out to C++ via JNI.
It frequently, though not predictably, runs out of file descriptors. I have upped the limits in /etc/security/limits.conf to:
* hard nofile 131072
/proc/sys/fs/file-max is 101752. The system is a Linode VPS running Ubuntu 8.04 LTS with kernel 2.6.35.4.
Opens fail from both the Java and C++ bits of the code after a certain point. Netstat doesn't show a large number of open sockets ("netstat -n | wc -l" is under 500). The number of open files in either lsof or /proc/{pid}/fd are the about expected 2000-5000.
This has had me grasping at straws for a few weeks (not constantly, but in flashes of fear and loathing every time I start getting notifications of things going boom).
There are a couple other loose threads that have me wondering if they offer any insight:
Since the process has about 75 threads, if the mmaped files were somehow taking up one file descriptor per thread, then the numbers add up. That said, doing a recursive count on the things in /proc/{pid}/tasks/*/fd currently lists 215575 fds, so it would seem that it should be already hitting the limits and it's not, so that seems unlikely.
Apache + Passenger are also running on the same box, and come in second for the largest number of file descriptors, but even with children none of those processes weigh in at over 10k descriptors.
I'm unsure where to go from there. Obviously something's making the app hit its limits, but I'm completely blank for what to check next. Any thoughts?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
因此,据我所知,这似乎是 Ubuntu 8.04 特有的问题。升级到10.04后,一个月后,再没有出现过这个问题。配置没有改变,所以我相信这一定是一个内核错误。
So, from all I can tell, this appears to have been an issue specific to Ubuntu 8.04. After upgrading to 10.04, after one month, there hasn't been a single instance of this problem. The configuration didn't change, so I'm lead to believe that this must have been a kernel bug.
你的设置使用了大量的代码,这些代码也可能会泄露; JVM。也许您可以在 sun 和开源 jvm 之间切换,以检查该代码是否偶然有罪。 jvm 还可以使用不同的垃圾收集器策略。使用不同的一个或不同的大小将导致或多或少的垃圾收集(在java中包括描述符的关闭)。
我知道这有点牵强,但似乎您已经遵循了所有其他选项;)
your setup uses a huge chunk of code that may be guilty of leaking too; the JVM. Maybe you can switch between the sun and the opensource jvms as a way to check if that code is not by chance guilty. Also there are different garbage collector strategies available for the jvm. Using a different one or different sizes will cause more or less garbage collects (which in java includes the closing of a descriptor).
I know its kinda far fetched, but it seems like all the other options you already followed ;)