Java ByteBuffer 性能问题
在处理多个千兆字节文件时,我注意到一些奇怪的事情:似乎使用文件通道从文件读取到使用 allocateDirect 分配的重用 ByteBuffer 对象比从 MappedByteBuffer 读取要慢得多,事实上它甚至比读取到字节还要慢使用常规读取调用的数组!
我期望它(几乎)与从mappedbytebuffers读取一样快,因为我的ByteBuffer是使用allocateDirect分配的,因此读取应该直接在我的bytebuffer中结束,而不需要任何中间副本。
我现在的问题是:我做错了什么?或者 bytebuffer+filechannel 真的比常规 io/mmap 慢吗?
在下面的示例代码中,我还添加了一些将读取的内容转换为长值的代码,因为这就是我的真实代码不断执行的操作。我希望 ByteBuffer getLong() 方法比我自己的字节洗牌器快得多。
测试结果: 映射:3.828 字节缓冲区:55.097 常规 i/o:38.175
import java.io.File;
import java.io.IOException;
import java.io.RandomAccessFile;
import java.nio.ByteBuffer;
import java.nio.channels.FileChannel;
import java.nio.channels.FileChannel.MapMode;
import java.nio.MappedByteBuffer;
class testbb {
static final int size = 536870904, n = size / 24;
static public long byteArrayToLong(byte [] in, int offset) {
return ((((((((long)(in[offset + 0] & 0xff) << 8) | (long)(in[offset + 1] & 0xff)) << 8 | (long)(in[offset + 2] & 0xff)) << 8 | (long)(in[offset + 3] & 0xff)) << 8 | (long)(in[offset + 4] & 0xff)) << 8 | (long)(in[offset + 5] & 0xff)) << 8 | (long)(in[offset + 6] & 0xff)) << 8 | (long)(in[offset + 7] & 0xff);
}
public static void main(String [] args) throws IOException {
long start;
RandomAccessFile fileHandle;
FileChannel fileChannel;
// create file
fileHandle = new RandomAccessFile("file.dat", "rw");
byte [] buffer = new byte[24];
for(int index=0; index<n; index++)
fileHandle.write(buffer);
fileChannel = fileHandle.getChannel();
// mmap()
MappedByteBuffer mbb = fileChannel.map(FileChannel.MapMode.READ_WRITE, 0, size);
byte [] buffer1 = new byte[24];
start = System.currentTimeMillis();
for(int index=0; index<n; index++) {
mbb.position(index * 24);
mbb.get(buffer1, 0, 24);
long dummy1 = byteArrayToLong(buffer1, 0);
long dummy2 = byteArrayToLong(buffer1, 8);
long dummy3 = byteArrayToLong(buffer1, 16);
}
System.out.println("mmap: " + (System.currentTimeMillis() - start) / 1000.0);
// bytebuffer
ByteBuffer buffer2 = ByteBuffer.allocateDirect(24);
start = System.currentTimeMillis();
for(int index=0; index<n; index++) {
buffer2.rewind();
fileChannel.read(buffer2, index * 24);
buffer2.rewind(); // need to rewind it to be able to use it
long dummy1 = buffer2.getLong();
long dummy2 = buffer2.getLong();
long dummy3 = buffer2.getLong();
}
System.out.println("bytebuffer: " + (System.currentTimeMillis() - start) / 1000.0);
// regular i/o
byte [] buffer3 = new byte[24];
start = System.currentTimeMillis();
for(int index=0; index<n; index++) {
fileHandle.seek(index * 24);
fileHandle.read(buffer3);
long dummy1 = byteArrayToLong(buffer1, 0);
long dummy2 = byteArrayToLong(buffer1, 8);
long dummy3 = byteArrayToLong(buffer1, 16);
}
System.out.println("regular i/o: " + (System.currentTimeMillis() - start) / 1000.0);
}
}
由于加载大的部分然后处理它们不是一个选项(我将在各处读取数据),我认为我应该坚持使用 MappedByteBuffer。 谢谢大家的建议。
While processing multiple gigabyte files I noticed something odd: it seems that reading from a file using a filechannel into a re-used ByteBuffer object allocated with allocateDirect is much slower than reading from a MappedByteBuffer, in fact it is even slower than reading into byte-arrays using regular read calls!
I was expecting it to be (almost) as fast as reading from mappedbytebuffers as my ByteBuffer is allocated with allocateDirect, hence the read should end-up directly in my bytebuffer without any intermediate copies.
My question now is: what is it that I'm doing wrong? Or is bytebuffer+filechannel really slowe r than regular io/mmap?
I the example code below I also added some code that converts what is read into long values, as that is what my real code constantly does. I would expect that the ByteBuffer getLong() method is much faster than my own byte shuffeler.
Test-results:
mmap: 3.828
bytebuffer: 55.097
regular i/o: 38.175
import java.io.File;
import java.io.IOException;
import java.io.RandomAccessFile;
import java.nio.ByteBuffer;
import java.nio.channels.FileChannel;
import java.nio.channels.FileChannel.MapMode;
import java.nio.MappedByteBuffer;
class testbb {
static final int size = 536870904, n = size / 24;
static public long byteArrayToLong(byte [] in, int offset) {
return ((((((((long)(in[offset + 0] & 0xff) << 8) | (long)(in[offset + 1] & 0xff)) << 8 | (long)(in[offset + 2] & 0xff)) << 8 | (long)(in[offset + 3] & 0xff)) << 8 | (long)(in[offset + 4] & 0xff)) << 8 | (long)(in[offset + 5] & 0xff)) << 8 | (long)(in[offset + 6] & 0xff)) << 8 | (long)(in[offset + 7] & 0xff);
}
public static void main(String [] args) throws IOException {
long start;
RandomAccessFile fileHandle;
FileChannel fileChannel;
// create file
fileHandle = new RandomAccessFile("file.dat", "rw");
byte [] buffer = new byte[24];
for(int index=0; index<n; index++)
fileHandle.write(buffer);
fileChannel = fileHandle.getChannel();
// mmap()
MappedByteBuffer mbb = fileChannel.map(FileChannel.MapMode.READ_WRITE, 0, size);
byte [] buffer1 = new byte[24];
start = System.currentTimeMillis();
for(int index=0; index<n; index++) {
mbb.position(index * 24);
mbb.get(buffer1, 0, 24);
long dummy1 = byteArrayToLong(buffer1, 0);
long dummy2 = byteArrayToLong(buffer1, 8);
long dummy3 = byteArrayToLong(buffer1, 16);
}
System.out.println("mmap: " + (System.currentTimeMillis() - start) / 1000.0);
// bytebuffer
ByteBuffer buffer2 = ByteBuffer.allocateDirect(24);
start = System.currentTimeMillis();
for(int index=0; index<n; index++) {
buffer2.rewind();
fileChannel.read(buffer2, index * 24);
buffer2.rewind(); // need to rewind it to be able to use it
long dummy1 = buffer2.getLong();
long dummy2 = buffer2.getLong();
long dummy3 = buffer2.getLong();
}
System.out.println("bytebuffer: " + (System.currentTimeMillis() - start) / 1000.0);
// regular i/o
byte [] buffer3 = new byte[24];
start = System.currentTimeMillis();
for(int index=0; index<n; index++) {
fileHandle.seek(index * 24);
fileHandle.read(buffer3);
long dummy1 = byteArrayToLong(buffer1, 0);
long dummy2 = byteArrayToLong(buffer1, 8);
long dummy3 = byteArrayToLong(buffer1, 16);
}
System.out.println("regular i/o: " + (System.currentTimeMillis() - start) / 1000.0);
}
}
As loading large sections and then processing is them is not an option (I'll be reading data all over the place) I think I should stick to a MappedByteBuffer.
Thank you all for your suggestions.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
我相信你只是在做微优化, 这可能无关紧要 (www.codinghorror.com)。
下面是一个具有更大缓冲区并删除了冗余
seek
/setPosition
调用的版本。这是代码:
I believe you are just doing micro-optimization, which might just not matter (www.codinghorror.com).
Below is a version with a larger buffer and redundant
seek
/setPosition
calls removed.Here's the code:
读入直接字节缓冲区速度更快,但将数据从其中读取到 JVM 速度较慢。直接字节缓冲区适用于仅复制数据而不在 Java 代码中实际查看数据的情况。然后它根本不必跨越本机-> JVM 边界,因此它比使用例如 byte[] 数组或普通 ByteBuffer 更快,在后者中数据在复制过程中必须跨越该边界两次。
Reading into the direct byte buffer is faster, but getting the data out of it into the JVM is slower. Direct byte buffer is intended for cases where you're just copying the data without actually looking at it in the Java code. Then it doesn't have to cross the native->JVM boundary at all, so it's quicker than using e.g. a byte[] array or a normal ByteBuffer, where the data would have to cross that boundary twice in the copy process.
当你有一个迭代超过 10,000 次的循环时,它可以触发整个方法被编译为本机代码。但是,您后面的循环尚未运行,无法优化到相同程度。为了避免此问题,请将每个循环放在不同的方法中并再次运行。
此外,您可能希望将 ByteBuffer 的 Order 设置为 order(ByteOrder.nativeOrder()),以避免在执行
getLong
并一次读取超过 24 个字节时所有字节交换。 (因为读取非常小的部分会生成更多的系统调用)尝试一次读取 32*1024 字节。我还尝试使用本机字节顺序在 MappedByteBuffer 上使用
getLong
。这可能是最快的。When you have a loop which iterates more than 10,000 times it can trigger the whole method to be compiled to native code. However, your later loops have not been run and cannot be optimised to the same degree. To avoid this issue, place each loop in a different method and run again.
Additionally, you may want to set the Order for the ByteBuffer to be order(ByteOrder.nativeOrder()) to avoid all the bytes swapping around when you do a
getLong
and read more than 24 bytes at a time. (As reading very small portions generates much more system calls) Try reading 32*1024 bytes at a time.I wound also try
getLong
on the MappedByteBuffer with native byte order. This is likely to be the fastest.MappedByteBuffer 始终是最快的,因为操作系统将操作系统级磁盘缓冲区与进程内存空间相关联。相比之下,读入分配的直接缓冲区时,首先将该块加载到操作系统缓冲区中,然后将操作系统缓冲区的内容复制到分配的进程内缓冲区中。
您的测试代码还执行大量非常小的(24 字节)读取。如果您的实际应用程序执行相同的操作,那么您将通过映射文件获得更大的性能提升,因为每次读取都是单独的内核调用。通过映射,您应该会看到数倍的性能。
至于直接缓冲区比 java.io 读取慢:您没有给出任何数字,但我预计会有轻微的降级,因为
getLong()
调用需要跨越 JNI 边界。A
MappedByteBuffer
will always be the fastest, because the operating system associates the OS-level disk buffer with your process memory space. Reading into an allocated direct buffer, by comparison, first loads the block into the OS buffer, then copies the contents of the OS buffer into the allocated in-process buffer.Your test code also does lots of very small (24 byte) reads. If your actual application does the same, then you'll get an even bigger performance boost from mapping the file, because each of the reads is a separate kernel call. You should see several times the performance by mapping.
As for the direct buffer being slower than the java.io reads: you don't give any numbers, but I'd expect a slight degredation because the
getLong()
calls need to cross the JNI boundary.