Java map / nio / NFS 问题导致 VM 故障：“编译的 Java 代码中最近的不安全内存访问操作发生故障”

发布于 2024-09-03 21:34:20 字数 1942 浏览 3 评论 0原文

我已经为特定的二进制格式编写了一个解析器类（nfdump，如果有人感兴趣），它使用 java.nio MappedByteBuffer 读取以下文件每个几GB。二进制格式只是一系列标头和大部分固定大小的二进制记录，通过调用 nextRecord() 将其提供给被调用者，该状态机会推送状态机，完成后返回 null。它表现良好。它可以在开发机器上运行。

在我的生产主机上，它可以运行几分钟或几个小时，但似乎总是抛出“java.lang.InternalError：编译的 Java 代码中最近不安全的内存访问操作中发生故障”，指着 Map.getInt 之一、getShort方法，即map中的读操作。

设置映射的无争议（？）代码是这样的：

    /** Set up the map from the given filename and position */
    protected void open() throws IOException {
            // Set up buffer, is this all the flexibility we'll need?
            channel = new FileInputStream(file).getChannel();    
            MappedByteBuffer map1 = channel.map(FileChannel.MapMode.READ_ONLY, 0, channel.size());
            map1.load(); // we want the whole thing, plus seems to reduce frequency of crashes?
            map = map1;
            // assumes the host  writing the files is little-endian (x86), ought to be configurable
            map.order(java.nio.ByteOrder.LITTLE_ENDIAN);
            map.position(position);
    }

然后我使用各种 map.get* 方法来读取短整型、整数、长整型和其他字节序列，然后到达文件末尾并关闭映射。

我从未见过我的开发主机上抛出异常。但我的生产主机和开发主机之间的显着区别在于，在前者上，我通过 NFS 读取这些文件的序列（最终可能为 6-8TB，仍在增长）。在我的开发计算机上，我在本地选择了较小的这些文件 (60GB)，但当它在生产主机上崩溃时，通常会在它达到 60GB 数据之前就发生。

两台机器都运行 java 1.6.0_20-b02，尽管生产主机运行 Debian/lenny，但开发主机运行 Ubuntu/karmic。我不相信这会产生任何影响。两台机器都有 16GB RAM，并且使用相同的 java 堆设置运行。

我认为，如果我的代码中存在错误，那么 JVM 中的错误就足够多，不会引发适当的异常！但我认为这只是由于 NFS 和 mmap 之间的交互而导致的一个特定的 JVM 实现错误，可能是 6244515 已正式修复。

我已经尝试添加“加载”调用来强制 MappedByteBuffer 将其内容加载到 RAM 中 - 这似乎延迟了我所做的一次测试运行中的错误，但并没有阻止它。也可能是巧合，这是它坠毁前所经历的最长时间！

如果您已经读到这里并且之前使用 java.nio 做过这种事情，您的直觉会是什么？现在我的目标是在没有 nio 的情况下重写它:)

原文

I have written a parser class for a particular binary format (nfdump if anyone is interested) which uses java.nio's MappedByteBuffer to read through files of a few GB each. The binary format is just a series of headers and mostly fixed-size binary records, which are fed out to the called by calling nextRecord(), which pushes on the state machine, returning null when it's done. It performs well. It works on a development machine.

On my production host, it can run for a few minutes or hours, but always seems to throw "java.lang.InternalError: a fault occurred in a recent unsafe memory access operation in compiled Java code", fingering one of the Map.getInt, getShort methods, i.e. a read operation in the map.

The uncontroversial (?) code that sets up the map is this:

    /** Set up the map from the given filename and position */
    protected void open() throws IOException {
            // Set up buffer, is this all the flexibility we'll need?
            channel = new FileInputStream(file).getChannel();    
            MappedByteBuffer map1 = channel.map(FileChannel.MapMode.READ_ONLY, 0, channel.size());
            map1.load(); // we want the whole thing, plus seems to reduce frequency of crashes?
            map = map1;
            // assumes the host  writing the files is little-endian (x86), ought to be configurable
            map.order(java.nio.ByteOrder.LITTLE_ENDIAN);
            map.position(position);
    }

and then I use the various map.get* methods to read shorts, ints, longs and other sequences of bytes, before hitting the end of the file and closing the map.

I've never seen the exception thrown on my development host. But the significant point of difference between my production host and development is that on the former, I am reading sequences of these files over NFS (probably 6-8TB eventually, still growing). On my dev machine, I have a smaller selection of these files locally (60GB), but when it blows up on the production host it's usually well before it gets to 60GB of data.

Both machines are running java 1.6.0_20-b02, though the production host is running Debian/lenny, the dev host is Ubuntu/karmic. I'm not convinced that will make any difference. Both machines have 16GB RAM, and are running with the same java heap settings.

I take the view that if there is a bug in my code, there is enough of a bug in the JVM not to throw me a proper exception! But I think it is just a particular JVM implementation bug due to interactions between NFS and mmap, possibly a recurrence of 6244515 which is officially fixed.

I already tried adding in a "load" call to force the MappedByteBuffer to load its contents into RAM - this seemed to delay the error in the one test run I've done, but not prevent it. Or it could be coincidence that was the longest it had gone before crashing!

If you've read this far and have done this kind of thing with java.nio before, what would your instinct be? Right now mine is to rewrite it without nio :)

分享到QQ

分享到微博