多个线程可以看到 Java 中直接映射的 ByteBuffer 上的写入吗?

发布于 2024-11-28 21:34:22 字数 1265 浏览 6 评论 0原文

我正在开发使用 ByteBuffers 的东西从内存映射文件构建(通过 FileChannel.map()) 以及内存中的直接 ByteBuffers。我试图了解并发和内存模型的限制。

我已经阅读了所有相关的 Javadoc(和源代码),例如 FileChannel、ByteBuffer、MappedByteBuffer 等。很明显,特定的 ByteBuffer(和相关子类)有一堆字段,并且状态不受内存模型的保护观点看法。因此,如果跨线程使用特定 ByteBuffer 的状态,则在修改该缓冲区时必须同步。常见的技巧包括使用 ThreadLocal 来包装 ByteBuffer、复制(同步时)以获取指向相同映射字节的新实例等。

在这种情况下:

  1. 管理器有一个映射字节缓冲区 B_all 用于整个文件(假设它是 <2gb)
  2. 管理器在 B_all 上调用重复()、位置()、限制()和切片()来创建一个新的较小的字节缓冲区B_1,该文件的一个块和将其提供给线程 T1
  3. 管理器执行所有相同的操作来创建指向相同映射字节的 ByteBuffer B_2 并将其提供给线程 T2

我的问题是: T1 能否同时写入 B_1 和 T2 写入 B_2并保证看到彼此的变化? T3 能否使用 B_all 读取这些字节并保证看到 T1 和 T2 的更改?

我知道映射文件中的写入不一定会跨进程看到,除非您使用force() 指示操作系统将页面写入磁盘。我不在乎这个。对于这个问题,假设该 JVM 是唯一写入单个映射文件的进程。

注意:我不是在寻找猜测(我自己可以很好地做出这些猜测)。我想参考一些关于内存映射直接缓冲区的保证(或不保证)的明确内容。或者如果你有实际的经历或者负面的测试案例,也可以作为充分的证据。

更新:我已经做了一些测试,让多个线程并行写入同一个文件,到目前为止,这些写入似乎可以立即从其他线程中看到。但我不确定我是否可以依赖这一点。

I'm working on something that uses ByteBuffers built from memory-mapped files (via FileChannel.map()) as well as in-memory direct ByteBuffers. I am trying to understand the concurrency and memory model constraints.

I have read all of the relevant Javadoc (and source) for things like FileChannel, ByteBuffer, MappedByteBuffer, etc. It seems clear that a particular ByteBuffer (and relevant subclasses) has a bunch of fields and the state is not protected from a memory model point of view. So, you must synchronize when modifying state of a particular ByteBuffer if that buffer is used across threads. Common tricks include using a ThreadLocal to wrap the ByteBuffer, duplicate (while synchronized) to get a new instance pointing to the same mapped bytes, etc.

Given this scenario:

  1. manager has a mapped byte buffer B_all for the entire file (say it's <2gb)
  2. manager calls duplicate(), position(), limit(), and slice() on B_all to create a new smaller ByteBuffer B_1 that a chunk of the file and gives this to thread T1
  3. manager does all the same stuff to create a ByteBuffer B_2 pointing to the same mapped bytes and gives this to thread T2

My question is: Can T1 write to B_1 and T2 write to B_2 concurrently and be guaranteed to see each other's changes? Could T3 use B_all to read those bytes and be guaranteed to see the changes from both T1 and T2?

I am aware that writes in a mapped file are not necessarily seen across processes unless you use force() to instruct the OS to write the pages down to disk. I don't care about that. Assume for this question that this JVM is the only process writing a single mapped file.

Note: I am not looking for guesses (I can make those quite well myself). I would like references to something definitive about what is (or is not) guaranteed for memory-mapped direct buffers. Or if you have actual experiences or negative test cases, that could also serve as sufficient evidence.

Update: I have done some tests with having multiple threads write to the same file in parallel and so far it seems those writes are immediately visible from other threads. I'm not sure if I can rely on that though.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(7

醉殇 2024-12-05 21:34:22

JVM 的内存映射只是 CreateFileMapping (Windows) 或 mmap (posix) 的一个薄包装。因此,您可以直接访问操作系统的缓冲区高速缓存。这意味着这些缓冲区是操作系统认为文件包含的内容(操作系统最终将同步文件以反映这一点)。

所以不需要调用force()来在进程之间进行同步。这些进程已经同步(通过操作系统 - 甚至读/写访问相同的页面)。强制操作系统和驱动器控制器之间仅同步(驱动器控制器和物理盘片之间可能存在一些延迟,但您没有硬件支持对此执行任何操作)。

无论如何,内存映射文件是线程和/或进程之间共享内存的可接受形式。此共享内存与 Windows 中的命名虚拟内存块之间的唯一区别是最终同步到磁盘(事实上,mmap 通过映射 /dev/null 来实现没有文件的虚拟内存)。

从多个进程/线程读取写入内存仍然需要一些同步,因为处理器能够执行无序执行(不确定这与 JVM 交互有多少,但你不能做出假设),但是从一个线程将具有与正常写入堆中任何字节相同的保证。一旦写入,每个线程和每个进程都会看到更新(即使通过打开/读取操作)。

有关更多信息,请在 posix 中查找 mmap(或者在 Windows 中查找 CreateFileMapping,其构建方式几乎相同。

Memory mapping with the JVM is just a thin wrapper around CreateFileMapping (Windows) or mmap (posix). As such, you have direct access to the buffer cache of the OS. This means that these buffers are what the OS considers the file to contain (and the OS will eventually synch the file to reflect this).

So there is no need to call force() to sync between processes. The processes are already synched (via the OS - even read/write accesses the same pages). Forcing just synchs between the OS and the drive controller (there can be some delay between the drive controller and the physical platters, but you don't have hardware support to do anything about that).

Regardless, memory mapped files are an accepted form of shared memory between threads and/or processes. The only difference between this shared memory and, say, a named block of virtual memory in Windows is the eventual synchronization to disk (in fact mmap does the virtual memory without a file thing by mapping /dev/null).

Reading writing memory from multiple processes/threads does still need some synch, as processors are able to do out-of-order execution (not sure how much this interacts with JVMs, but you can't make presumptions), but writing a byte from one thread will have the same guarantees as writing to any byte in the heap normally. Once you have written to it, every thread, and every process, will see the update (even through an open/read operation).

For more info, look up mmap in posix (or CreateFileMapping for Windows, which was built almost the same way.

小姐丶请自重 2024-12-05 21:34:22

不可以。JVM 内存模型 (JMM) 不保证多个线程改变(不同步)数据会看到彼此的更改。

首先,考虑到访问共享内存的所有线程都在同一个 JVM 中,通过映射的 ByteBuffer 访问该内存的事实是无关紧要的(通过 ByteBuffer 访问的内存没有隐式易失性或同步),所以问题相当于访问字节数组。

让我们重新表述一下这个问题,使其与字节数组有关:

  1. 管理器有一个字节数组:byte[] B_all
  2. 创建对该数组的新引用:byte[] B_1 = B_all,并将其提供给线程T1
  3. 创建对该数组的另一个引用:byte[] B_2 = B_all,并提供给线程T2

线程 T1B_1 的写入是否会在线程 T2B_2 中看到?

不,如果 T_1T_2 之间没有显式同步,则不能保证看到此类写入。问题的核心在于,JVM 的 JIT、处理器和内存架构可以自由地重新排序一些内存访问(不仅仅是为了惹恼你,而是为了通过缓存来提高性能)。所有这些层都希望软件能够明确(通过锁、易失性或其他显式提示)需要同步的位置,这意味着当没有提供此类提示时,这些层可以自由地移动内容。

请注意,在实践中,您是否看到写入主要取决于硬件以及各级缓存和寄存器中数据的对齐方式,以及正在运行的线程在内存层次结构中的“距离”有多“远”。

JSR-133 是在 Java 5.0 左右精确定义 Java 内存模型的努力(据我所知,它在 2012 年仍然适用)。这就是您想要寻找明确(尽管密集)答案的地方: http ://www.cs.umd.edu/~pugh/java/memoryModel/jsr133.pdf(第 2 节最相关)。更多可读的内容可以在 JMM 网页上找到: http://www.cs .umd.edu/~pugh/java/memoryModel/

我的回答的一部分是断言 ByteBufferbyte[] 在术语上没有什么不同的数据同步。我找不到说明这一点的具体文档,但我建议 java.nio.Buffer 文档会提到有关同步或易失性的内容(如果适用)。由于文档没有提到这一点,因此我们不应期望出现这种行为。

No. The JVM memory model (JMM) does not guarantee that multiple threads mutating (unsynchronized) data will see each others changes.

First, given all the threads accessing the shared memory are all in the same JVM, the fact that this memory is being accessed through a mapped ByteBuffer is irrelevant (there is no implicit volatile or synchronization on memory accessed through a ByteBuffer), so the question is equivalent to one about accessing a byte array.

Let's rephrase the question so its about byte arrays:

  1. A manager has a byte array: byte[] B_all
  2. A new reference to that array is created: byte[] B_1 = B_all, and given to thread T1
  3. Another reference to that array is created: byte[] B_2 = B_all, and given to thread T2

Do writes to B_1 by thread T1 get seen in B_2 by thread T2?

No, such writes are not guaranteed to be seen, without some explicit synchronization between T_1 and T_2. The core of the problem is that the JVM's JIT, the processor, and the memory architecture are free to re-order some memory accesses (not just to piss you off, but to improve performance through caching). All these layers expect the software to be explicit (through locks, volatile or other explicit hints) about where synchronization is required, implying these layers are free to move stuff around when no such hints are provided.

Note that in practice whether you see the writes or not depends mostly on the hardware and the alignment of the data in the various levels of caches and registers, and how "far" away the running threads are in the memory hierarchy.

JSR-133 was an effort to precisely define the Java Memory Model circa Java 5.0 (and as far as I know its still applicable in 2012). That is where you want to look for definitive (though dense) answers: http://www.cs.umd.edu/~pugh/java/memoryModel/jsr133.pdf (section 2 is most relevant). More readable stuff can be found on the JMM web page: http://www.cs.umd.edu/~pugh/java/memoryModel/

Part of my answer is asserting that the a ByteBuffer is no different from a byte[] in terms of data synchronization. I can't find specific documentation that says this, but I suggest that "Thread Safety" section of the java.nio.Buffer doc would mention something about synchronization or volatile if that was applicable. Since the doc doesn't mention this, we should not expect such behavior.

温柔一刀 2024-12-05 21:34:22

你能做的最便宜的事情就是使用 volatile 变量。线程写入映射区域后,它应该向 volatile 变量写入一个值。任何读取线程都应该在读取映射缓冲区之前读取易失性变量。这样做会在 Java 内存模型中产生“happens-before”。

请注意,您不能保证另一个进程正在编写新内容。但是,如果您想保证其他线程可以看到您编写的内容,那么编写一个易失性(然后从读取线程中读取它)就可以了。

The cheapest thing you can do is use a volatile variable. After a thread writes to the mapped area, it should write a value to a volatile variable. Any reading thread should read the volatile variable before reading the mapped buffer. Doing this produces a "happens-before" in the Java memory model.

Note that you have NO guarantee that another process is in the middle of writing something new. But if you want to guarantee that other threads can see something you've written, writing a volatile (followed by reading it from the reading thread) will do the trick.

心是晴朗的。 2024-12-05 21:34:22

我认为直接内存提供与堆内存相同的保证或缺乏保证。如果修改共享底层数组或直接内存地址的 ByteBuffer,则第二个 ByteBuffer 是另一个线程可以看到更改,但不能保证这样做。

我怀疑即使你使用同步或易失性,它仍然不能保证工作,但它可能会这样做,具体取决于平台。

Exchanger ,

在线程之间更改数据的一种简单方法是使用基于示例的

class FillAndEmpty {
   final Exchanger<ByteBuffer> exchanger = new Exchanger<ByteBuffer>();
   ByteBuffer initialEmptyBuffer = ... a made-up type
   ByteBuffer initialFullBuffer = ...

   class FillingLoop implements Runnable {
     public void run() {
       ByteBuffer currentBuffer = initialEmptyBuffer;
       try {
         while (currentBuffer != null) {
           addToBuffer(currentBuffer);
           if (currentBuffer.remaining() == 0)
             currentBuffer = exchanger.exchange(currentBuffer);
         }
       } catch (InterruptedException ex) { ... handle ... }
     }
   }

   class EmptyingLoop implements Runnable {
     public void run() {
       ByteBuffer currentBuffer = initialFullBuffer;
       try {
         while (currentBuffer != null) {
           takeFromBuffer(currentBuffer);
           if (currentBuffer.remaining() == 0)
             currentBuffer = exchanger.exchange(currentBuffer);
         }
       } catch (InterruptedException ex) { ... handle ...}
     }
   }

   void start() {
     new Thread(new FillingLoop()).start();
     new Thread(new EmptyingLoop()).start();
   }
 }

I would assume that direct memory provides the same guarantees or lack of them as heap memory. If you modify a ByteBuffer which shares an underlying array or direct memory address, a second ByteBuffer is another thread can see the changes, but is not guaranteed to do so.

I suspect even if you use synchronized or volatile, it is still not guaranteed to work, however it may well do so depending on the platform.

A simple way to change data between threads is to use an Exchanger

Based on the example,

class FillAndEmpty {
   final Exchanger<ByteBuffer> exchanger = new Exchanger<ByteBuffer>();
   ByteBuffer initialEmptyBuffer = ... a made-up type
   ByteBuffer initialFullBuffer = ...

   class FillingLoop implements Runnable {
     public void run() {
       ByteBuffer currentBuffer = initialEmptyBuffer;
       try {
         while (currentBuffer != null) {
           addToBuffer(currentBuffer);
           if (currentBuffer.remaining() == 0)
             currentBuffer = exchanger.exchange(currentBuffer);
         }
       } catch (InterruptedException ex) { ... handle ... }
     }
   }

   class EmptyingLoop implements Runnable {
     public void run() {
       ByteBuffer currentBuffer = initialFullBuffer;
       try {
         while (currentBuffer != null) {
           takeFromBuffer(currentBuffer);
           if (currentBuffer.remaining() == 0)
             currentBuffer = exchanger.exchange(currentBuffer);
         }
       } catch (InterruptedException ex) { ... handle ...}
     }
   }

   void start() {
     new Thread(new FillingLoop()).start();
     new Thread(new EmptyingLoop()).start();
   }
 }
人心善变 2024-12-05 21:34:22

我遇到的一种可能的答案是使用文件锁来获得对缓冲区映射的磁盘部分的独占访问权限。例如,此处对此进行了解释。

我猜测这确实会保护磁盘部分,以防止对文件的同一部分进行并发写入。使用基于 Java 的磁盘文件部分监视器可以实现相同的效果(在单个 JVM 中,但对其他进程不可见)。我猜这会更快,但缺点是对外部进程不可见。

当然,如果 jvm/os 保证一致性,我希望避免文件锁定或页面同步。

One possible answer I've run across is using file locks to gain exclusive access to the portion of the disk mapped by the buffer. This is explained with an example here for instance.

I'm guessing that this would really guard the disk section to prevent concurrent writes on the same section of file. The same thing could be achieved (in a single JVM but invisible to other processes) with Java-based monitors for sections of the disk file. I'm guessing that would be faster with the downside of being invisible to external processes.

Of course, I'd like to avoid either file locking or page synchronization if consistency is guaranteed by the jvm/os.

女中豪杰 2024-12-05 21:34:22

我不认为这是有保证的。如果 Java 内存模型没有说它是有保证的,那么根据定义,它就是没有保证的。我会使用同步保护缓冲区写入或处理所有写入的一个线程的队列写入。后者与多核缓存配合得很好(最好为每个 RAM 位置配备 1 个写入器)。

I do not think that this is guaranteed. If the Java Memory Model doesn't say that it's guaranteed it is by definition not guaranteed. I would either guard buffer writes with synchronized or queue writes for one thread that handles all writes. The latter plays nicely with multicore caching (better to have 1 writer for each RAM location).

心凉怎暖 2024-12-05 21:34:22

不,它与普通的 java 变量或数组元素没有什么不同。

No, it's no different from normal java variables or array elements.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文