在Java中,AtomicIntegercompareAndSet()与synchronized关键字的性能如何?

发布于 2024-09-15 22:31:32 字数 1216 浏览 2 评论 0原文

我正在实现一个请求实例的 FIFO 队列(为了速度而预先分配的请求对象),并开始在 add 方法上使用“synchronized”关键字。该方法非常短(检查固定大小缓冲区中是否有空间,然后将值添加到数组)。使用 VisualVM 时,线程阻塞的频率似乎比我喜欢的要高(准确地说是“监视”)。因此,我将代码转换为使用 AtomicInteger 值来执行诸如跟踪当前大小之类的操作,然后在 while 循环中使用compareAndSet()(就像 AtomicInteger 在内部对incrementAndGet() 等方法所做的那样)。现在代码看起来有点长。

我想知道的是,使用同步和较短的代码与不使用同步关键字的较长代码相比,性能开销是多少(因此永远不应该阻塞锁)。

这是带有synchronized关键字的旧get方法:

public synchronized Request get()
{
    if (head == tail)
    {
        return null;
    }
    Request r = requests[head];
    head = (head + 1) % requests.length;
    return r;
}

这是不带synchronized关键字的新get方法:

public Request get()
{
    while (true)
    {
        int current = size.get();
        if (current <= 0)
        {
            return null;
        }
        if (size.compareAndSet(current, current - 1))
        {
            break;
        }
    }

    while (true)
    {
        int current = head.get();
        int nextHead = (current + 1) % requests.length;
        if (head.compareAndSet(current, nextHead))
        {
            return requests[current];
        }
    }
}

我的猜测是synchronized关键字更糟糕,因为存在阻塞锁的风险(可能导致线程上下文切换等),即使代码更短。

谢谢!

I was implementing a FIFO queue of requests instances (preallocated request objects for speed) and started with using the "synchronized" keyword on the add method. The method was quite short (check if room in fixed size buffer, then add value to array). Using visualVM it appeared the thread was blocking more often than I liked ("monitor" to be precise). So I converted the code over to use AtomicInteger values for things such as keeping track of the current size, then using compareAndSet() in while loops (as AtomicInteger does internally for methods such as incrementAndGet()). The code now looks quite a bit longer.

What I was wondering is what is the performance overhead of using synchronized and shorter code versus longer code without the synchronized keyword (so should never block on a lock).

Here is the old get method with the synchronized keyword:

public synchronized Request get()
{
    if (head == tail)
    {
        return null;
    }
    Request r = requests[head];
    head = (head + 1) % requests.length;
    return r;
}

Here is the new get method without the synchronized keyword:

public Request get()
{
    while (true)
    {
        int current = size.get();
        if (current <= 0)
        {
            return null;
        }
        if (size.compareAndSet(current, current - 1))
        {
            break;
        }
    }

    while (true)
    {
        int current = head.get();
        int nextHead = (current + 1) % requests.length;
        if (head.compareAndSet(current, nextHead))
        {
            return requests[current];
        }
    }
}

My guess was the synchronized keyword is worse because of the risk of blocking on the lock (potentially causing thread context switches etc), even though the code is shorter.

Thanks!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

失去的东西太少 2024-09-22 22:31:32

我的猜测是synchronized关键字更糟糕,因为锁上有阻塞的风险(可能导致线程上下文切换等)

是的,在常见情况下,您是对的。 Java 并发实践 在第 15.3.2 节中对此进行了讨论:

[...] 在高争用级别,锁定往往优于原子变量,但在更现实的争用级别,原子变量优于锁。这是因为锁通过挂起线程来应对争用,从而减少共享内存总线上的 CPU 使用率和同步流量。 (这类似于生产者-消费者设计中的阻塞生产者如何减少消费者的负载,从而让他们赶上。)另一方面,使用原子变量,争用管理被推回到调用类。与大多数基于 CAS 的算法一样,AtomicPseudoRandom 通过立即重试来对争用做出反应,这通常是正确的方法,但在高争用环境中只会产生更多争用。

在我们谴责 AtomicPseudoRandom 与锁相比写得不好或原子变量是一个糟糕的选择之前,我们应该意识到图 15.1 中的争用程度高得不切实际:没有真正的程序除了争用之外什么也不做。对于锁或原子变量。在实践中,原子往往比锁具有更好的扩展性,因为原子可以更有效地处理典型的争用级别。

不同争用级别下锁和原子之间的性能逆转说明了各自的优点和缺点。在低到中等的争用情况下,原子提供了更好的可扩展性;对于高争用,锁可以更好地避免争用。 (基于 CAS 的算法在单 CPU 系统上也优于基于锁的算法,因为 CAS 总是在单 CPU 系统上成功,除非线程在读-修改-写操作过程中被抢占(这种不太可能发生的情况)。 )

(在文中提到的数字上,图 15.1 显示,当争用较高时,AtomicInteger 和 ReentrantLock 的性能或多或少相等,而图 15.2 显示,在中等争用情况下,前者的性能优于后者 2 倍​​- 3.)

更新:关于非阻塞算法

正如其他人所指出的,非阻塞算法虽然可能更快,但更复杂,因此更难以正确执行。 JCiA 第 15.4 节的提示:

良好的非阻塞算法以许多常见的数据结构而闻名,包括堆栈、队列、优先级队列和哈希表,尽管设计新的任务最好留给专家。

非阻塞算法比基于锁的算法复杂得多。创建非阻塞算法的关键是弄清楚如何限制单个变量的原子更改范围,同时保持数据一致性。在链接集合类(例如队列)中,您有时可以将状态转换表示为对各个链接的更改,并使用 AtomicReference 来表示必须以原子方式更新的每个链接。

My guess was the synchronized keyword is worse because of the risk of blocking on the lock (potentially causing thread context switches etc)

Yes, in the common case you are right. Java Concurrency in Practice discusses this in section 15.3.2:

[...] at high contention levels locking tends to outperform atomic variables, but at more realistic contention levels atomic variables outperform locks. This is because a lock reacts to contention by suspending threads, reducing CPU usage and synchronization traffic on the shared memory bus. (This is similar to how blocking producers in a producer-consumer design reduces the load on consumers and thereby lets them catch up.) On the other hand, with atomic variables, contention management is pushed back to the calling class. Like most CAS-based algorithms, AtomicPseudoRandom reacts to contention by trying again immediately, which is usually the right approach but in a high-contention environment just creates more contention.

Before we condemn AtomicPseudoRandom as poorly written or atomic variables as a poor choice compared to locks, we should realize that the level of contention in Figure 15.1 is unrealistically high: no real program does nothing but contend for a lock or atomic variable. In practice, atomics tend to scale better than locks because atomics deal more effectively with typical contention levels.

The performance reversal between locks and atomics at differing levels of contention illustrates the strengths and weaknesses of each. With low to moderate contention, atomics offer better scalability; with high contention, locks offer better contention avoidance. (CAS-based algorithms also outperform lock-based ones on single-CPU systems, since a CAS always succeeds on a single-CPU system except in the unlikely case that a thread is preempted in the middle of the read-modify-write operation.)

(On the figures referred to by the text, Figure 15.1 shows that the performance of AtomicInteger and ReentrantLock is more or less equal when contention is high, while Figure 15.2 shows that under moderate contention the former outperforms the latter by a factor of 2-3.)

Update: on nonblocking algorithms

As others have noted, nonblocking algorithms, although potentially faster, are more complex, thus more difficult to get right. A hint from section 15.4 of JCiA:

Good nonblocking algorithms are known for many common data structures, including stacks, queues, priority queues, and hash tables, though designing new ones is a task best left to experts.

Nonblocking algorithms are considerably more complicated than their lock-based equivalents. The key to creating nonblocking algorithms is figuring out how to limit the scope of atomic changes to a single variable while maintaining data consistency. In linked collection classes such as queues, you can sometimes get away with expressing state transformations as changes to individual links and using an AtomicReference to represent each link that must be updated atomically.

终止放荡 2024-09-22 22:31:32

我想知道 jvm 在真正挂起线程之前是否已经进行了一些旋转。它预计像您这样写得好的关键部分非常短并且几乎立即完成。因此,在放弃并挂起线程之前,它应该乐观地忙等待,我不知道,几十个循环。如果是这种情况,它的行为应该与您的第二个版本相同。

分析器显示的内容可能与全速运行的 jvm 中实际发生的情况有很大不同,并进行了各种疯狂的优化。最好在没有分析器的情况下测量和比较吞吐量。

I wonder if jvm already does a few spin before really suspending the thread. It anticipate that well written critical sections, like yours, are very short and complete almost immediately. Therefore it should optimistically busy-wait for, I don't know, dozens of loops, before giving up and suspending the thread. If that's the case, it should behave the same as your 2nd version.

what a profiler shows might be very different from what's realy happending in a jvm at full speed, with all kinds of crazy optimizations. it's better to measure and compare throughputs without profiler.

猫卆 2024-09-22 22:31:32

在进行这种同步优化之前,您确实需要一个分析器来告诉您这是绝对必要的。

是的,在某些情况下同步可能比原子操作慢,但请比较您的原始方法和替换方法。前者非常清晰且易于维护,后者则肯定更复杂。因此,可能存在非常微妙的并发错误,您在初始测试期间不会发现这些错误。我已经看到一个问题,sizehead 确实可能不同步,因为虽然这些操作中的每一个都是原子的,但组合却不是,有时这可能会导致到不一致的状态。

所以,我的建议是:

  1. 开始简单的
  2. 配置文件
  3. 如果性能足够好,则保留简单的实现
  4. 如果您需要提高性能,那么开始变得聪明(可能首先使用更专门的锁),然后测试测试测试

Before doing this kind of synchronization optimizations, you really need a profiler to tell you that it's absolutely necessary.

Yes, synchronized under some conditions may be slower than atomic operation, but compare your original and replacement methods. The former is really clear and easy to maintain, the latter, well it's definitely more complex. Because of this there may be very subtle concurrency bugs, that you will not find during initial testing. I already see one problem, size and head can really get out of sync, because, though each of these operations is atomic, the combination is not, and sometimes this may lead to an inconsistent state.

So, my advise:

  1. Start simple
  2. Profile
  3. If performance is good enough, leave simple implementation as is
  4. If you need performance improvement, then start to get clever (possibly using more specialized lock at first), and TEST, TEST, TEST
昔日梦未散 2024-09-22 22:31:32

这是繁忙等待锁的代码。

public class BusyWaitLock
{
    private static final boolean LOCK_VALUE = true;
    private static final boolean UNLOCK_VALUE = false;
    private final static Logger log = LoggerFactory.getLogger(BusyWaitLock.class);

    /**
     * @author Rod Moten
     *
     */
    public class BusyWaitLockException extends RuntimeException
    {

        /**
         * 
         */
        private static final long serialVersionUID = 1L;

        /**
         * @param message
         */
        public BusyWaitLockException(String message)
        {
            super(message);
        }



    }

    private AtomicBoolean lock = new AtomicBoolean(UNLOCK_VALUE);
    private final long maximumWaitTime ; 

    /**
     * Create a busy wait lock with that uses the default wait time of two minutes.
     */
    public BusyWaitLock()
    {
        this(1000 * 60 * 2); // default is two minutes)
    }

    /**
     * Create a busy wait lock with that uses the given value as the maximum wait time.
     * @param maximumWaitTime - a positive value that represents the maximum number of milliseconds that a thread will busy wait.
     */
    public BusyWaitLock(long maximumWaitTime)
    {
        if (maximumWaitTime < 1)
            throw new IllegalArgumentException (" Max wait time of " + maximumWaitTime + " is too low. It must be at least 1 millisecond.");
        this.maximumWaitTime = maximumWaitTime;
    }

    /**
     * 
     */
    public void lock ()
    {
        long startTime = System.currentTimeMillis();
        long lastLogTime = startTime;
        int logMessageCount = 0;
        while (lock.compareAndSet(UNLOCK_VALUE, LOCK_VALUE)) {
            long waitTime = System.currentTimeMillis() - startTime;
            if (waitTime - lastLogTime > 5000) {
                log.debug("Waiting for lock. Log message # {}", logMessageCount++);
                lastLogTime = waitTime;
            }
            if (waitTime > maximumWaitTime) {
                log.warn("Wait time of {} exceed maximum wait time of {}", waitTime, maximumWaitTime);
                throw new BusyWaitLockException ("Exceeded maximum wait time of " + maximumWaitTime + " ms.");
            }
        }
    }

    public void unlock ()
    {
        lock.set(UNLOCK_VALUE);
    }
}

Here's code for a busy wait lock.

public class BusyWaitLock
{
    private static final boolean LOCK_VALUE = true;
    private static final boolean UNLOCK_VALUE = false;
    private final static Logger log = LoggerFactory.getLogger(BusyWaitLock.class);

    /**
     * @author Rod Moten
     *
     */
    public class BusyWaitLockException extends RuntimeException
    {

        /**
         * 
         */
        private static final long serialVersionUID = 1L;

        /**
         * @param message
         */
        public BusyWaitLockException(String message)
        {
            super(message);
        }



    }

    private AtomicBoolean lock = new AtomicBoolean(UNLOCK_VALUE);
    private final long maximumWaitTime ; 

    /**
     * Create a busy wait lock with that uses the default wait time of two minutes.
     */
    public BusyWaitLock()
    {
        this(1000 * 60 * 2); // default is two minutes)
    }

    /**
     * Create a busy wait lock with that uses the given value as the maximum wait time.
     * @param maximumWaitTime - a positive value that represents the maximum number of milliseconds that a thread will busy wait.
     */
    public BusyWaitLock(long maximumWaitTime)
    {
        if (maximumWaitTime < 1)
            throw new IllegalArgumentException (" Max wait time of " + maximumWaitTime + " is too low. It must be at least 1 millisecond.");
        this.maximumWaitTime = maximumWaitTime;
    }

    /**
     * 
     */
    public void lock ()
    {
        long startTime = System.currentTimeMillis();
        long lastLogTime = startTime;
        int logMessageCount = 0;
        while (lock.compareAndSet(UNLOCK_VALUE, LOCK_VALUE)) {
            long waitTime = System.currentTimeMillis() - startTime;
            if (waitTime - lastLogTime > 5000) {
                log.debug("Waiting for lock. Log message # {}", logMessageCount++);
                lastLogTime = waitTime;
            }
            if (waitTime > maximumWaitTime) {
                log.warn("Wait time of {} exceed maximum wait time of {}", waitTime, maximumWaitTime);
                throw new BusyWaitLockException ("Exceeded maximum wait time of " + maximumWaitTime + " ms.");
            }
        }
    }

    public void unlock ()
    {
        lock.set(UNLOCK_VALUE);
    }
}
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文