Java Gridgain 应用程序在压力测试 1 天后开始失败
因此,我有一个在 gridgain 之上运行的应用程序,并且在大约 12-24 小时的压力测试中非常成功,然后才开始表现得有趣。在这段时间之后,应用程序将突然开始回复所有查询,并出现 java.nio.channels.ClosedByInterruptException 异常(完整堆栈跟踪位于 http://pastie.org/664717
失败的方法是(编辑为使用@stephenc反馈)
public static com.vlc.edge.FileChannel createChannel(final File file) {
FileChannel channel = null;
try {
channel = new FileInputStream(file).getChannel();
channel.position(0);
final com.vlc.edge.FileChannel fileChannel = new FileChannelImpl(channel);
channel = null;
return fileChannel;
} catch (FileNotFoundException e) {
throw new VlcRuntimeException("Failed to open file: " + file, e);
} catch (IOException e) {
throw new VlcRuntimeException(e);
} finally {
if (channel != null) {
try {
channel.close();
} catch (IOException e){
// noop
LOGGER.error("There was a problem closing the file: " + file);
}
}
}
}
并且调用函数正确关闭对象
private void fillContactBuffer(final File signFile) {
contactBuffer = ByteBuffer.allocate((int) signFile.length());
final FileChannel channel = FileUtils.createChannel(signFile);
try {
channel.read(contactBuffer);
} finally {
channel.close();
}
contactBuffer.rewind();
}
该应用程序基本上充当一个分布式文件解析器,因此它会执行很多此类操作(每个节点的每个查询通常会打开大约 10 个这样的通道),似乎在一段时间后它就不再能够打开文件,我不知所措。解释为什么会发生这种情况,并且非常感谢任何能够告诉我可能导致这种情况的人以及我如何追踪并修复它如果它可能与文件句柄耗尽有关,我很想听听。任何确定的提示...即在 JVM 运行时查询它或使用 Linux 命令行工具来查找有关当前打开的句柄的更多信息。
更新:我一直在使用命令行工具来询问 lsof 的输出,但没有看到任何文件句柄保持打开状态的证据...网格中的每个节点都有一个打开的文件的配置文件非常稳定,我可以看到随着上述代码的执行而发生变化......但它总是返回到稳定数量的打开文件。
与此问题相关:释放java文件句柄
So I have a an application which is running on top of gridgain and does so quite successfully for about 12-24 hours of stress testing before it starts to act funny. After this period of time the application will suddenly start replying to all queries with the exception java.nio.channels.ClosedByInterruptException (full stack trace is at http://pastie.org/664717
The method that is failing from is (edited to use @stephenc feedback)
public static com.vlc.edge.FileChannel createChannel(final File file) {
FileChannel channel = null;
try {
channel = new FileInputStream(file).getChannel();
channel.position(0);
final com.vlc.edge.FileChannel fileChannel = new FileChannelImpl(channel);
channel = null;
return fileChannel;
} catch (FileNotFoundException e) {
throw new VlcRuntimeException("Failed to open file: " + file, e);
} catch (IOException e) {
throw new VlcRuntimeException(e);
} finally {
if (channel != null) {
try {
channel.close();
} catch (IOException e){
// noop
LOGGER.error("There was a problem closing the file: " + file);
}
}
}
}
and the calling function correctly closes the object
private void fillContactBuffer(final File signFile) {
contactBuffer = ByteBuffer.allocate((int) signFile.length());
final FileChannel channel = FileUtils.createChannel(signFile);
try {
channel.read(contactBuffer);
} finally {
channel.close();
}
contactBuffer.rewind();
}
The application basically serves as a distributed file parser so it does a lot of these types of operations (will typically open about 10 such channels per query per node). It seems that after a certain period it stops being able to open files and I'm at a loss to explain why this could be happening and would greatly appreciate any one who can tell me what could be causing this and how I could go about tracking it down and fixing it. If it is possibly related to file handle exhaustion, I'd love to hear any tips for finding out for sure... i.e. querying the JVM while it's running or using linux command line tools to find out more information about what handles are currently open.
update: I've been using command line tools to interrogate the output of lsof and haven't been able to see any evidence that file handles are being held open... each node in the grid has a very stable profile of openned files which I can see changing as the above code is executed... but it always returns to a stable number of open files.
Related to this question: Freeing java file handles
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
在多种情况下,文件句柄可能不会被关闭:
createChannel(...)
但不调用fillContactBuffer(...)
If
channel。 position(0)
抛出异常,通道不会关闭。修复方法是重新排列代码,以便以下语句位于try
块内。编辑:查看堆栈跟踪,这两种方法似乎位于不同的代码库中。我会把责任归咎于
createChannel
方法。即使它不是问题的根源,它也可能存在泄漏。它需要一个内部的finally
子句来确保通道在发生异常时关闭。像这样的事情应该可以解决问题。请注意,您需要确保
finally
块不会在成功时关闭通道!FOLLOWUP 很久以后
鉴于文件句柄泄漏已被排除为可能的原因,我的下一个理论是服务器端实际上正在使用 Thread.interrupt() 中断自己的线程>。一些低级 I/O 调用通过抛出异常来响应中断,这里抛出的根异常看起来就像这样一个异常。
这并不能解释为什么会发生这种情况,但我大胆猜测,这是服务器端框架试图解决过载或死锁问题。
There are a couple of scenarios where file handles might not be being closed:
createChannel(...)
and doesn't callfillContactBuffer(...)
If
channel.position(0)
throws an exception, the channel won't be closed. The fix is to rearrange the code so that the following statements are inside thetry
block.EDIT: Looking at the stack trace, it seems that the two methods are in different code-bases. I'd point the finger of blame at the
createChannel
method. It is potentially leaky, even if it is not the source of your problems. It needs an in internalfinally
clause to make sure that the channel is closed in the event of an exception.Something like this should do the trick. Note that you need to make sure that the
finally
block does not closes the channel on success!FOLLOWUP much later
Given that file handle leakage has been eliminated as a possible cause, my next theory would be that the server side is actually interrupting its own threads using
Thread.interrupt()
. Some low-level I/O calls respond to an interrupt by throwing an exception, and the root exception being thrown here looks like one such exception.This doesn't explain why this is happening, but at a wild guess I'd say that it was the server-side framework trying to resolve an overload or deadlock problem.