Java 中同步的记忆效应

发布于 2024-08-14 09:37:26 字数 767 浏览 7 评论 0原文

JSR-133 常见问题解答 说:

但是同步还有更多内容 而不是相互排斥。同步 确保内存由线程写入 在同步块之前或期间 以可预测的方式可见 方式到其他线程 在同一台显示器上同步。后 我们退出同步块,我们 释放监视器,其中有 将缓存刷新到 main 的效果 内存,以便由此进行的写入 线程对其他人可见 线程。在我们可以输入之前 同步块,我们获取 监视器,其作用是 使本地处理器缓存失效 这样变量就会被重新加载 来自主存储器。然后我们就能够 查看所有可见的写入 按之前的版本。

我还记得读到过,在现代 Sun VM 上,无竞争的同步很便宜。我对这个说法有点困惑。考虑如下代码:

class Foo {
    int x = 1;
    int y = 1;
    ..
    synchronized (aLock) {
        x = x + 1;
    }
}

对 x 的更新需要同步,但是获取锁是否也会从缓存中清除 y 的值?我无法想象情况会是这样,因为如果这是真的,像锁条带这样的技术可能没有帮助。或者,JVM 是否可以可靠地分析代码,以确保 y 不会在使用相同锁的另一个同步块中被修改,从而在进入同步块时不会将 y 的值转储到缓存中?

JSR-133 FAQ says:

But there is more to synchronization
than mutual exclusion. Synchronization
ensures that memory writes by a thread
before or during a synchronized block
are made visible in a predictable
manner to other threads which
synchronize on the same monitor. After
we exit a synchronized block, we
release the monitor, which has the
effect of flushing the cache to main
memory, so that writes made by this
thread can be visible to other
threads. Before we can enter a
synchronized block, we acquire the
monitor, which has the effect of
invalidating the local processor cache
so that variables will be reloaded
from main memory. We will then be able
to see all of the writes made visible
by the previous release.

I also remember reading that on modern Sun VMs uncontended synchronizations are cheap. I am a little confused by this claim. Consider code like:

class Foo {
    int x = 1;
    int y = 1;
    ..
    synchronized (aLock) {
        x = x + 1;
    }
}

Updates to x need the synchronization, but does the acquisition of the lock clear the value of y also from the cache? I can't imagine that to be the case, because if it were true, techniques like lock striping might not help. Alternatively can the JVM reliably analyze the code to ensure that y is not modified in another synchronized block using the same lock and hence not dump the value of y in cache when entering the synchronized block?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

浅浅 2024-08-21 09:37:26

简而言之,JSR-133 的解释太过分了。这不是一个严重的问题,因为 JSR-133 是一个非规范文档,不属于语言或 JVM 标准的一部分。相反,它只是一个文档,解释了一种可能的策略,该策略足以实现内存模型,但通常不是必需的。最重要的是,关于“缓存刷新”的评论基本上完全不合适,因为基本上零架构将通过执行任何类型的“缓存刷新”来实现 Java 内存模型(许多架构甚至没有这样的指令)。

Java 内存模型是根据可见性、原子性、发生在关系等方面进行正式定义的,这准确地解释了线程必须看到什么、什么操作必须使用精确(数学)定义的模型在其他操作和其他关系之前发生。未正式定义的行为可能是随机的,或者在某些硬件和 JVM 实现的实践中是明确定义的 - 但当然,您永远不应该依赖于此,因为它将来可能会发生变化,并且您永远无法真正确定它首先是明确定义的,除非您编写 JVM 并且非常了解硬件语义。

因此,您引用的文本并没有正式描述 Java 的保证,而是描述了一些具有非常弱的内存排序和可见性保证的假设架构如何使用缓存刷新来满足 Java 内存模型要求。任何关于缓存刷新、主内存等的实际讨论显然不适用于 Java,因为这些概念在抽象语言和内存模型规范中不存在。

在实践中,内存模型提供的保证比完全刷新要弱得多——让每个原子、并发相关或锁定操作刷新整个缓存将是非常昂贵的——而这在实践中几乎从未这样做过。相反,使用特殊的原子 CPU 操作,有时与内存屏障指令结合使用,这有助于确保内存可见性和排序。因此,廉价的无竞争同步和“完全刷新缓存”之间的明显不一致可以通过注意到第一个是正确的而第二个不是这样来解决 - Java 内存模型不需要完全刷新(并且在实践中不会发生刷新)。

如果正式的内存模型有点太重而难以消化(您不会孤单),您也可以通过查看 Doug Lea 的食谱,实际上在 JSR-133 FAQ 中链接,但从具体的硬件角度来解决这个问题,因为它的目的是对于编译器编写者。在那里,他们确切地讨论了特定操作(包括同步)所需的障碍 - 并且那里讨论的障碍可以很容易地映射到实际硬件。许多实际映射都在说明书中讨论过。

The short answer is that JSR-133 goes too far in its explanation. This isn't a serious issue because JSR-133 is a non-normative document which isn't part of the language or JVM standards. Rather, it is only a document which explains one possible strategy that is sufficient for implementing the memory model, but isn't in general necessary. On top of that, the comment about "cache flushing" is basically totally out place since essentially zero architectures would implement the Java memory model by doing any type of "cache flushing" (and many architectures don't even have such instructions).

The Java memory model is formally defined in terms of things like visibility, atomicity, happens-before relationships and so on, which explains exactly what threads must see what, what actions must occur before other actions and other relationships using a precisely (mathematically) defined model. Behavior which isn't formally defined could be random, or well-defined in practice on some hardware and JVM implementation - but of course you should never rely on this, as it might change in the future, and you could never really be sure that it was well-defined in the first place unless you wrote the JVM and were well-aware of the hardware semantics.

So the text that you quoted is not formally describing what Java guarantees, but rather is describing how some hypothetical architecture which had very weak memory ordering and visibility guarantees could satisfy the Java memory model requirements using cache flushing. Any actual discussion of cache flushing, main memory and so on is clearly not generally applicable to Java as these concepts don't exist in the abstract language and memory model spec.

In practice, the guarantees offered by the memory model are much weaker than a full flush - having every atomic, concurrency-related or lock operation flush the entire cache would be prohibitively expensive - and this is almost never done in practice. Rather, special atomic CPU operations are used, sometimes in combination with memory barrier instructions, which help ensure memory visibility and ordering. So the apparent inconsistency between cheap uncontended synchronization and "fully flushing the cache" is resolved by noting that the first is true and the second is not - no full flush is required by the Java memory model (and no flush occurs in practice).

If the formal memory model is a bit too heavy to digest (you wouldn't be alone), you can also dive deeper into this topic by taking a look at Doug Lea's cookbook, which is in fact linked in the JSR-133 FAQ, but comes at the issue from a concrete hardware perspective, since it is intended for compiler writers. There, they talk about exactly what barriers are needed for particular operations, including synchronization - and the barriers discussed there can pretty easily be mapped to actual hardware. Much of the actual mapping is discussed right in the cookbook.

伤感在游骋 2024-08-21 09:37:26

BeeOnRope 是对的,您引用的文本更多地深入研究了典型的实现细节,而不是 Java 内存模型确实保证的内容。在实践中,您可能经常会看到,当您在 x 上进行同步时,y 实际上会从 CPU 缓存中清除(此外,如果示例中的 x 是一个易失性变量,在这种情况下,不需要显式同步来触发效果)。这是因为在大多数 CPU 上(请注意,这是硬件效应,而不是 JMM 描述的内容),缓存在称为缓存行的单元上工作,这些单元通常比机器字长(例如 64 字节宽)。由于只有完整的行才能在缓存中加载或失效,因此 x 和 y 很有可能落入同一行,并且刷新其中一个也会刷新另一个。

可以编写一个显示这种效果的基准测试。创建一个只有两个 volatile int 字段的类,并让两个线程执行一些操作(例如,在长循环中递增),一个对一个字段,一个对另一个字段。安排手术时间。然后,在两个原始字段之间插入16个int字段并重复测试(16*4=64)。请注意,数组只是一个引用,因此 16 个元素的数组无法实现这一目的。您可能会看到性能显着提高,因为一个字段上的操作将不再影响另一字段。这是否适合您将取决于 JVM 实现和处理器架构。我在 Sun JVM 和典型的 x64 笔记本电脑上实践过这一点,性能差异是数倍。

BeeOnRope is right, the text you quote delves more into typical implementation details than into what the Java Memory Model does indeed guarantee. In practice, you may often see that y is actually purged from CPU caches when you synchronize on x (also, if x in your example were a volatile variable in which case explicit synchronization is not necessary to trigger the effect). This is because on most CPUs (note that this is a hardware effect, not something the JMM describes), the cache works on units called cache lines, which are usually longer than a machine word (for example 64 bytes wide). Since only complete lines can be loaded or invalidated in the cache, there are good chances that x and y will fall into the same line and that flushing one of them will also flush the other one.

It is possible to write a benchmark which shows this effect. Make a class with just two volatile int fields and let two threads perform some operations (e.g. incrementing in a long loop), one on one of the fields and one on the another. Time the operation. Then, insert 16 int fields in between the two original fields and repeat the test (16*4=64). Note that an array is just a reference so an array of 16 elements won't do the trick. You may see a significant improvement in performance because operations on one field will not influence the other one any more. Whether this works for you will depend on the JVM implementation and processor architecture. I have seen this in practice on Sun JVM and a typical x64 laptop, the difference in performance was by a factor of several times.

梦罢 2024-08-21 09:37:26

对x的更新需要同步,
但是否获取锁
也从中清除 y 的值
缓存?我无法想象那是
案例,因为如果这是真的,
像锁条带这样的技术可能
没有帮助。

我不确定,但我想答案可能是“是”。考虑一下:

class Foo {
    int x = 1;
    int y = 1;
    ..
    void bar() {
        synchronized (aLock) {
            x = x + 1;
        }
        y = y + 1;
    }
}

现在这段代码是不安全的,具体取决于程序其余部分发生的情况。但是,我认为内存模型意味着 bar 看到的 y 值不应早于获取锁时的“真实”值。这意味着 yx 的缓存必须失效。

JVM 还可以可靠地分析
确保 y 不被修改的代码
在另一个同步块中使用
同一把锁?

如果锁是 this,那么一旦所有类都已预加载,此分析看起来就可以作为全局优化。 (我并不是说这很容易或值得......)

在更一般的情况下,证明给定的锁仅与给定的“拥有”实例结合使用的问题可能很棘手。

Updates to x need the synchronization,
but does the acquisition of the lock
clear the value of y also from the
cache? I can't imagine that to be the
case, because if it were true,
techniques like lock striping might
not help.

I'm not sure, but I think the answer may be "yes". Consider this:

class Foo {
    int x = 1;
    int y = 1;
    ..
    void bar() {
        synchronized (aLock) {
            x = x + 1;
        }
        y = y + 1;
    }
}

Now this code is unsafe, depending on what happens im the rest of the program. However, I think that the memory model means that the value of y seen by bar should not be older than the "real" value at the time of acquisition of the lock. That would imply the cache must be invalidated for y as well as x.

Also can the JVM reliably analyze the
code to ensure that y is not modified
in another synchronized block using
the same lock?

If the lock is this, this analysis looks like it would be feasible as a global optimization once all classes have been preloaded. (I'm not saying that it would be easy, or worthwhile ...)

In more general cases, the problem of proving that a given lock is only ever used in connection with a given "owning" instance is probably intractable.

中性美 2024-08-21 09:37:26

我们是java开发人员,我们只知道虚拟机,不知道真机!

让我理论一下正在发生的事情 - 但我必须说我不知道​​我在说什么。

假设线程 A 运行在具有缓存 A 的 CPU A 上,线程 B 运行在具有缓存 B 的 CPU B 上,

  1. 线程 A 读取 y; CPU A 从主存中取出 y,并将值保存到缓存 A 中。

  2. 线程 B 将新值赋给“y”。此时VM不必更新主存;就线程B而言,它可以在“y”的本地图像上进行读/写操作;也许“y”只是一个CPU寄存器。

  3. 线程 B 退出同步块并释放监视器。 (它进入区块的时间和地点并不重要)。到目前为止,线程 B 已经更新了相当多的变量,包括“y”。所有这些更新现在都必须写入主内存。

  4. CPU B 将新的 y 值写入主内存中。 (我想象)几乎立即,信息“主 y 已更新”被连接到缓存 A,并且缓存 A 使其自己的 y 副本无效。这在硬件上一定发生得非常快。

  5. 线程 A 获取监视器并进入同步块 - 此时它不必对缓存 A 执行任何操作。“y”已经从缓存 A 中消失。当线程 A 再次读取 y 时,它是来自 main 的新鲜数据 当

考虑另一个变量 z,它也在步骤(1)中被 A 缓存,但在步骤(2)中线程 B 没有更新它。它可以在缓存 A 中一直存活到步骤(5)。对“z”的访问不会因为同步而减慢。

如果以上说法有道理的话,那么成本确实不是很高。


除了步骤(5)之外:线程 A 可能有自己的缓存,它甚至比缓存 A 更快 - 例如,它可以使用变量“y”的寄存器。不会被步骤(4)失效,因此在步骤(5)中,线程A必须在同步进入时擦除自己的缓存。但这并不是一个巨大的惩罚。

we are java developers, we only know virtual machines, not real machines!

let me theorize what is happening - but I must say I don't know what I'm talking about.

say thread A is running on CPU A with cache A, thread B is running on CPU B with cache B,

  1. thread A reads y; CPU A fetches y from main memory, and saved the value in cache A.

  2. thread B assigns new value to 'y'. VM doesn't have to update the main memory at this point; as far as thread B is concerned, it can be reading/writing on a local image of 'y'; maybe the 'y' is nothing but a cpu register.

  3. thread B exits a sync block and releases a monitor. (when and where it entered the block doesn't matter). thread B has updated quite some variables till this point, including 'y'. All those updates must be written to main memory now.

  4. CPU B writes the new y value to place 'y' in main memory. (I imagine that) almost INSTANTLY, information 'main y is updated' is wired to cache A, and cache A invalidate its own copy of y. That must have happened really FAST on the hardware.

  5. thread A acquires a monitor and enters a sync block - at this point it doesn't have to do anything regarding cache A. 'y' has already gone from cache A. when thread A reads y again, it's fresh from main memory with the new value assigned by B.

consider another variable z, which was also cached by A in step(1), but it's not updated by thread B in step(2). it can survive in cache A all the way to step(5). access to 'z' is not slowed down because of synchronization.

if the above statements make sense, then indeed the cost isn't very high.


addition to step(5): thread A may have its own cache which is even faster than cache A - it can use a register for variable 'y' for example. that will not be invalidated by step(4), therefore in step(5), thread A must erase its own cache upon sync entering. that's not a huge penalty though.

初吻给了烟 2024-08-21 09:37:26

你可能想检查jdk6.0文档
http://java.sun .com/javase/6/docs/api/java/util/concurrent/package-summary.html#MemoryVisibility

内存一致性属性
Java 语言规范第 17 章定义了内存操作(例如共享变量的读写)的happens-before关系。仅当写入操作发生在读取操作之前时,才保证一个线程的写入结果对另一线程的读取可见。同步和易失性构造以及Thread.start()和Thread.join()方法可以形成happens-before关系。特别是:

  • 线程中的每个操作都发生在该线程中按程序顺序稍后出现的每个操作之前。
  • 监视器的解锁(同步块或方法退出)发生在同一监视器的每个后续锁定(同步块或方法进入)之前。并且由于“发生之前”关系是可传递的,因此解锁之前线程的所有操作都发生在锁定该监视器的任何线程之后的所有操作之前
  • 。场地。易失性字段的写入和读取与进入和退出监视器具有类似的内存一致性效果,但不需要互斥锁定。
  • 对线程启动的调用发生在已启动线程中的任何操作之前。
  • 线程中的所有操作都发生在任何其他线程从该线程上的联接成功返回之前

因此,如上面突出显示的点所述:在监视器上发生解锁之前发生的所有更改对于所有这些线程都是可见的(并且在那里)自己的同步块),它锁定
相同的监视器。这符合Java 的happens-before 语义。
因此,当其他线程获取“aLock”上的监视器时,对 y 所做的所有更改也将刷新到主内存。

you might want to check jdk6.0 documentation
http://java.sun.com/javase/6/docs/api/java/util/concurrent/package-summary.html#MemoryVisibility

Memory Consistency Properties
Chapter 17 of the Java Language Specification defines the happens-before relation on memory operations such as reads and writes of shared variables. The results of a write by one thread are guaranteed to be visible to a read by another thread only if the write operation happens-before the read operation. The synchronized and volatile constructs, as well as the Thread.start() and Thread.join() methods, can form happens-before relationships. In particular:

  • Each action in a thread happens-before every action in that thread that comes later in the program's order.
  • An unlock (synchronized block or method exit) of a monitor happens-before every subsequent lock (synchronized block or method entry) of that same monitor. And because the happens-before relation is transitive, all actions of a thread prior to unlocking happen-before all actions subsequent to any thread locking that monitor.
  • A write to a volatile field happens-before every subsequent read of that same field. Writes and reads of volatile fields have similar memory consistency effects as entering and exiting monitors, but do not entail mutual exclusion locking.
  • A call to start on a thread happens-before any action in the started thread.
  • All actions in a thread happen-before any other thread successfully returns from a join on that thread

So,as stated in highlighted point above:All the changes that happens before a unlock happens on a monitor is visible to all those threads(and in there own synchronization block) which take lock on
the same monitor.This is in accordance with Java's happens-before semantics.
Therefore,all changes made to y would also be flushed to main memory when some other thread acquires the monitor on 'aLock'.

段念尘 2024-08-21 09:37:26

同步保证只有一个线程可以进入代码块。但它不能保证在同步部分中完成的变量修改对其他线程可见。只有进入同步块的线程才能保证看到更改。
Java 中同步的内存效应可以与 C++ 和 Java 中的双重检查锁定问题进行比较
双重检查锁定被广泛引用并用作在多线程环境中实现延迟初始化的有效方法。不幸的是,如果没有额外的同步,它在用 Java 实现时将无法以独立于平台的方式可靠地工作。当用其他语言(例如 C++)实现时,它取决于处理器的内存模型、编译器执行的重新排序以及编译器和同步库之间的交互。由于这些都没有在 C++ 等语言中指定,因此很难说它适用于什么情况。显式内存屏障可用于使其在 C++ 中工作,但这些屏障在 Java 中不可用。

synchronize guarantees, that only one thread can enter a block of code. But it doesn't guarantee, that variables modifications done within synchronized section will be visible to other threads. Only the threads that enters the synchronized block is guaranteed to see the changes.
Memory effects of synchronization in Java could be compared with the problem of Double-Checked Locking with respect to c++ and Java
Double-Checked Locking is widely cited and used as an efficient method for implementing lazy initialization in a multi-threaded environment. Unfortunately, it will not work reliably in a platform independent way when implemented in Java, without additional synchronization. When implemented in other languages, such as C++, it depends on the memory model of the processor, the re-orderings performed by the compiler and the interaction between the compiler and the synchronization library. Since none of these are specified in a language such as C++, little can be said about the situations in which it will work. Explicit memory barriers can be used to make it work in C++, but these barriers are not available in Java.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文