In a sense it's worse than pthread_mutex_init, actually. Because of Java's wait/notify you kind of need a paired mutex and condition variable to implement a monitor.
In practice, when implementing a JVM you hunt down and apply every single platform-specific optimisation in the book, and then invent some new ones, to make monitors as fast as possible. If you can't do a really fiendish job of that, you definitely aren't up to optimising garbage collection ;-)
One observation is that not every object needs to have its own monitor. An object which isn't currently synchronised doesn't need one. So the JVM can create a pool of monitors, and each object could just have a pointer field, which is filled in when a thread actually wants to synchronise on the object (with a platform-specific atomic compare and swap operation, for instance). So the cost of monitor initialisation doesn't have to add to the cost of object creation. Assuming the memory is pre-cleared, object creation can be: decrement a pointer (plus some kind of bounds check, with a predicted-false branch to the code that runs gc and so on); fill in the type; call the most derived constructor. I think you can arrange for the constructor of Object to do nothing, but obviously a lot depends on the implementation.
In practice, the average Java application isn't synchronising on very many objects at any one time, so monitor pools are potentially a huge optimisation in time and memory.
I'm not sure how Java does it, but .NET doesn't keep the mutex (or analog - the structure that holds it is called "syncblk" there) directly in the object. Rather, it has a global table of syncblks, and object references its syncblk by index in that table. Furthermore, objects don't get a syncblk as soon as they're created - instead, it's created on demand on the first lock.
I assume (note, I do not know how it actually does that!) that it uses atomic compare-and-exchange to associate the object and its syncblk in a thread-safe way:
Check the hidden syncblk_index field of our object for 0. If it's not 0, lock it and proceed, otherwise...
Create a new syncblk in global table, get the index for it (global locks are acquired/released here as needed).
Compare-and-exchange to write it into object itself.
If previous value was 0 (assume that 0 is not a valid index, and is the initial value for the hidden syncblk_index field of our objects), our syncblk creation was not contested. Lock on it and proceed.
If previous value was not 0, then someone else had already created a syncblk and associated it with the object while we were creating ours, and we have the index of that syncblk now. Dispose the one we've just created, and lock on the one that we've obtained.
Thus the overhead per-object is 4 bytes (assuming 32-bit indices into syncblk table) in best case, but larger for objects which actually have been locked. If you only rarely lock on your objects, then this scheme looks like a good way to cut down on resource usage. But if you need to lock on most or all your objects eventually, storing a mutex directly within the object might be faster.
Surely you don't need such a monitor for every object!
When porting from Java to C++, it strikes me as a bad idea to just copy everything blindly. The best structure for Java is not the same as the best for C++, not least because Java has garbage collection and C++ doesn't.
Add a monitor to only those objects that really need it. If only some instances of a type need synchronization then it's not that hard to create a wrapper class that contains the mutex (and possibly condition variable) necessary for synchronization. As others have already said, an alternative is to use a pool of synchronization objects with some means of choosing one for each object, such as using a hash of the object address to index the array.
I'd use the boost thread library or the new C++0x standard thread library for portability rather than relying on platform specifics at each turn. Boost.Thread supports Linux, MacOSX, win32, Solaris, HP-UX and others. My implementation of the C++0x thread library currently only supports Windows and Linux, but other implementations will become available in due course.
发布评论
评论(4)
从某种意义上说,它实际上比 pthread_mutex_init 更糟糕。由于 Java 的等待/通知,您需要一对互斥体和条件变量来实现监视器。
实际上,在实现 JVM 时,您会寻找并应用书中的每一个特定于平台的优化,然后发明一些新的优化,以使监视器尽可能快。如果你不能做到这一点,那么你肯定无法优化垃圾收集;-)
一个观察结果是,并非每个对象都需要拥有自己的监视器。当前未同步的对象不需要同步。因此,JVM 可以创建一个监视器池,并且每个对象可以只有一个指针字段,当线程实际想要在对象上同步时(例如,使用特定于平台的原子比较和交换操作),将填充该指针字段。因此监视器初始化的成本不必增加对象创建的成本。假设内存已预先清除,对象创建可以是: 递减一个指针(加上某种边界检查,以及运行 gc 等的代码的预测错误分支);填写类型;调用最派生的构造函数。我认为您可以安排 Object 的构造函数不执行任何操作,但显然很大程度上取决于实现。
实际上,一般的 Java 应用程序不会同时同步很多对象,因此监视器池可能是时间和内存方面的巨大优化。
In a sense it's worse than
pthread_mutex_init
, actually. Because of Java's wait/notify you kind of need a paired mutex and condition variable to implement a monitor.In practice, when implementing a JVM you hunt down and apply every single platform-specific optimisation in the book, and then invent some new ones, to make monitors as fast as possible. If you can't do a really fiendish job of that, you definitely aren't up to optimising garbage collection ;-)
One observation is that not every object needs to have its own monitor. An object which isn't currently synchronised doesn't need one. So the JVM can create a pool of monitors, and each object could just have a pointer field, which is filled in when a thread actually wants to synchronise on the object (with a platform-specific atomic compare and swap operation, for instance). So the cost of monitor initialisation doesn't have to add to the cost of object creation. Assuming the memory is pre-cleared, object creation can be: decrement a pointer (plus some kind of bounds check, with a predicted-false branch to the code that runs gc and so on); fill in the type; call the most derived constructor. I think you can arrange for the constructor of Object to do nothing, but obviously a lot depends on the implementation.
In practice, the average Java application isn't synchronising on very many objects at any one time, so monitor pools are potentially a huge optimisation in time and memory.
Sun Hotspot JVM 使用比较和交换薄锁< /strong>。如果一个对象被锁定,则等待线程将在锁定该对象的线程的监视器上等待。这意味着每个线程只需要一个重锁。
The Sun Hotspot JVM implements thin locks using compare and swap. If an object is locked, then the waiting thread wait on the monitor of thread which locked the object. This means you only need one heavy lock per thread.
我不确定 Java 是如何做到这一点的,但是 .NET 并不直接将互斥锁(或类似的结构——保存它的结构称为“syncblk”)直接保留在对象中。相反,它有一个全局的syncblk表,对象通过该表中的索引引用它的syncblk。此外,对象不会在创建后立即获得同步块 - 相反,它是在第一个锁时按需创建的。
我假设(注意,我不知道它实际上是如何做到这一点的!)它使用原子比较和交换以线程安全的方式关联对象及其syncblk:
syncblk_index
字段的初始值),则我们的syncblk 创建不会受到竞争。锁定它并继续。因此,在最好的情况下,每个对象的开销为 4 个字节(假设在syncblk 表中使用 32 位索引),但对于实际已锁定的对象来说,开销更大。如果您很少锁定对象,那么此方案看起来是减少资源使用的好方法。但是,如果您最终需要锁定大多数或所有对象,则直接在对象中存储互斥体可能会更快。
I'm not sure how Java does it, but .NET doesn't keep the mutex (or analog - the structure that holds it is called "syncblk" there) directly in the object. Rather, it has a global table of syncblks, and object references its syncblk by index in that table. Furthermore, objects don't get a syncblk as soon as they're created - instead, it's created on demand on the first lock.
I assume (note, I do not know how it actually does that!) that it uses atomic compare-and-exchange to associate the object and its syncblk in a thread-safe way:
syncblk_index
field of our object for 0. If it's not 0, lock it and proceed, otherwise...syncblk_index
field of our objects), our syncblk creation was not contested. Lock on it and proceed.Thus the overhead per-object is 4 bytes (assuming 32-bit indices into syncblk table) in best case, but larger for objects which actually have been locked. If you only rarely lock on your objects, then this scheme looks like a good way to cut down on resource usage. But if you need to lock on most or all your objects eventually, storing a mutex directly within the object might be faster.
当然,您不需要为每个对象都配备这样的监视器!
当从 Java 移植到 C++ 时,我认为盲目复制所有内容是一个坏主意。 Java 的最佳结构与 C++ 的最佳结构并不相同,尤其是因为 Java 有垃圾收集,而 C++ 没有。
仅将监视器添加到那些真正需要它的对象。如果只有某个类型的某些实例需要同步,那么创建一个包含同步所需的互斥体(可能还有条件变量)的包装类并不难。正如其他人已经说过的,另一种方法是使用同步对象池,并通过某种方式为每个对象选择一个同步对象,例如使用对象地址的哈希来索引数组。
我会使用 boost 线程库或新的 C++0x 标准线程库来实现可移植性,而不是每次都依赖于平台细节。 Boost.Thread 支持 Linux、MacOSX、win32、Solaris、HP-UX 等。我的C++0x线程库的实现目前仅支持Windows和Linux,但其他实现将成为适时可用。
Surely you don't need such a monitor for every object!
When porting from Java to C++, it strikes me as a bad idea to just copy everything blindly. The best structure for Java is not the same as the best for C++, not least because Java has garbage collection and C++ doesn't.
Add a monitor to only those objects that really need it. If only some instances of a type need synchronization then it's not that hard to create a wrapper class that contains the mutex (and possibly condition variable) necessary for synchronization. As others have already said, an alternative is to use a pool of synchronization objects with some means of choosing one for each object, such as using a hash of the object address to index the array.
I'd use the boost thread library or the new C++0x standard thread library for portability rather than relying on platform specifics at each turn. Boost.Thread supports Linux, MacOSX, win32, Solaris, HP-UX and others. My implementation of the C++0x thread library currently only supports Windows and Linux, but other implementations will become available in due course.