内存屏障和 TLB
内存屏障保证数据缓存的一致性。但是,它能保证TLB的一致性吗?
我发现一个问题:在线程之间传递 MappedByteBuffer 时,JVM(java 7 update 1)有时会因内存错误(SIGBUS、SIGSEG)而崩溃。
例如,
final AtomicReference<MappedByteBuffer> mbbQueue = new AtomicReference<>();
// in a background thread.
MappedByteBuffer map = raf.map(MapMode.READ_WRITE, offset, allocationSize);
Thread.yield();
while (!inQueue.compareAndSet(null, map));
// the main thread. (more than 10x faster than using map() in the same thread)
MappedByteBuffer mbb = inQueue.getAndSet(null);
如果没有Thread.yield(),我偶尔会在force()、put() 和C 的memcpy() 中崩溃,所有这些都表明我试图非法访问内存。使用 Thread.yield() 我没有遇到问题,但这听起来不像是一个可靠的解决方案。
有人遇到过这个问题吗? TLB 和内存屏障有任何保证吗?
编辑:操作系统是 Centos 5.7,我已经看到了 i7 和双 Xeon 机器上的行为。
我为什么要这样做?因为写入消息的平均时间为 35-100 ns,具体取决于长度,并且使用普通 write() 的速度没有那么快。如果我在当前线程中进行内存映射和清理,则需要 50-130 微秒,而使用后台线程执行此操作,主线程交换缓冲区大约需要 3-5 微秒。为什么我需要交换缓冲区?因为我正在写入许多 GB 的数据,而 ByteBuffer 的大小不能为 2+ GB。
Memory barriers guarantee that the data cache will be consistent. However, does it guarantee that the TLB will be consistent?
I am seeing a problem where the JVM (java 7 update 1) sometimes crashes with memory errors (SIGBUS, SIGSEG) when passing a MappedByteBuffer between threads.
e.g.
final AtomicReference<MappedByteBuffer> mbbQueue = new AtomicReference<>();
// in a background thread.
MappedByteBuffer map = raf.map(MapMode.READ_WRITE, offset, allocationSize);
Thread.yield();
while (!inQueue.compareAndSet(null, map));
// the main thread. (more than 10x faster than using map() in the same thread)
MappedByteBuffer mbb = inQueue.getAndSet(null);
Without the Thread.yield() I occasionally get crashes in force(), put(), and C's memcpy() all indicating I am trying to access memory illegally. With the Thread.yield() I haven't had a problem, but that doesn't sound like a reliable solution.
Has anyone come across this problem? Are there any guarantees about TLB and memory barriers?
EDIT: The OS is Centos 5.7, I have seen the behaviour on i7 and a Dual Xeon machines.
Why do I do this? Because the average time to write a message is 35-100 ns depending on length and using a plain write() isn't as fast. If I memory map and clean up in the current thread this takes 50-130 microseconds, using a background thread to do it takes about 3-5 microseconds for the main thread to swap buffers. Why do I need to be swapping buffers at all? Because I am writing many GB of data and ByteBuffer cannot be 2+ GB in size.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
映射是通过 mmap64 (FileChannel.map) 完成的。当访问该地址时,将出现页面错误,内核将为您读取/写入该地址。 mmap期间不需要更新TLB。
TLB(所有 cpu 的)在 munmap 期间未验证,这是通过 MappedByteBuffer 的最终确定来处理的,因此 munmap 的成本很高。
映射涉及大量同步,因此地址值不应被破坏。
您有机会通过 Unsafe 尝试一些奇特的东西吗?
The mapping is done via mmap64 (FileChannel.map). When the address is accessed there will be a page fault and the kernel shall read/write there for you. TLB doesn't need to be updated during mmap.
TLB (of all cpus) is unvalidated during munmap which is handled by the finalization of the MappedByteBuffer, hence munmap is costly.
Mapping involves a lot synchronization so the address value shall not be corrupted.
Any chance you try fancy stuff via Unsafe?