当前位置：文江博客话题详情

set concurrency Java

Java 中不同类型的线程安全集

发布于 2024-11-24 15:27:05 字数 911 浏览 12 评论 0 原文

Java 中似乎有很多不同的实现和方法来生成线程安全集。一些示例包括

1) CopyOnWriteArraySet

2 ) Collections.synchronizedSet(集合集合)

3) ConcurrentSkipListSet

4) Collections.newSetFromMap(new ConcurrentHashMap())

5)其他Sets的生成方式与(4)类似，

这些例子来自并发模式：Java 6 中的并发集实现

有人可以简单解释一下这些例子和其他例子的区别、优点和缺点吗？我无法理解并保持 Java 标准文档中的所有内容。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

你好，陌生人 2024-12-01 15:27:06

通过使用 AtomicReference 可以将 HashSet 的 contains() 性能与 CopyOnWriteArraySet 的并发相关属性结合起来; 并在每次修改时替换整个集合。

实现草图：

public abstract class CopyOnWriteSet<E> implements Set<E> {

    private final AtomicReference<Set<E>> ref;

    protected CopyOnWriteSet( Collection<? extends E> c ) {
        ref = new AtomicReference<Set<E>>( new HashSet<E>( c ) );
    }

    @Override
    public boolean contains( Object o ) {
        return ref.get().contains( o );
    }

    @Override
    public boolean add( E e ) {
        while ( true ) {
            Set<E> current = ref.get();
            if ( current.contains( e ) ) {
                return false;
            }
            Set<E> modified = new HashSet<E>( current );
            modified.add( e );
            if ( ref.compareAndSet( current, modified ) ) {
                return true;
            }
        }
    }

    @Override
    public boolean remove( Object o ) {
        while ( true ) {
            Set<E> current = ref.get();
            if ( !current.contains( o ) ) {
                return false;
            }
            Set<E> modified = new HashSet<E>( current );
            modified.remove( o );
            if ( ref.compareAndSet( current, modified ) ) {
                return true;
            }
        }
    }

}

It is possible to combine the contains() performance of HashSet with the concurrency-related properties of CopyOnWriteArraySet by using the AtomicReference<Set> and replacing the whole set on each modification.

The implementation sketch:

public abstract class CopyOnWriteSet<E> implements Set<E> {

    private final AtomicReference<Set<E>> ref;

    protected CopyOnWriteSet( Collection<? extends E> c ) {
        ref = new AtomicReference<Set<E>>( new HashSet<E>( c ) );
    }

    @Override
    public boolean contains( Object o ) {
        return ref.get().contains( o );
    }

    @Override
    public boolean add( E e ) {
        while ( true ) {
            Set<E> current = ref.get();
            if ( current.contains( e ) ) {
                return false;
            }
            Set<E> modified = new HashSet<E>( current );
            modified.add( e );
            if ( ref.compareAndSet( current, modified ) ) {
                return true;
            }
        }
    }

    @Override
    public boolean remove( Object o ) {
        while ( true ) {
            Set<E> current = ref.get();
            if ( !current.contains( o ) ) {
                return false;
            }
            Set<E> modified = new HashSet<E>( current );
            modified.remove( o );
            if ( ref.compareAndSet( current, modified ) ) {
                return true;
            }
        }
    }

}

回复收藏 0 原文

暖风昔人 2024-12-01 15:27:06

如果 Javadocs 没有帮助，您可能应该找一本关于数据结构的书或文章来阅读。概览：

每次更改集合时，CopyOnWriteArraySet 都会生成基础数组的新副本，因此写入速度很慢，而迭代器快速且一致。
Collections.synchronizedSet() 使用老式的同步方法调用来使 Set 线程安全。这将是一个低性能的版本。
ConcurrentSkipListSet 通过不一致的批处理操作（addAll、removeAll 等）和迭代器提供高性能写入。
Collections.newSetFromMap(new ConcurrentHashMap()) 具有 ConcurrentHashMap 的语义，我认为它不一定针对读取或写入进行优化，但与 ConcurrentSkipListSet 一样，具有不一致的批处理操作。

回复收藏 0 原文

煮茶煮酒煮时光 2024-12-01 15:27:06

并发弱引用集

另一个变化是线程安全的弱引用。

这样的集合可以方便地在 pub-sub 场景中跟踪订阅者。当订阅者在其他地方超出范围并因此成为垃圾收集的候选者时，订阅者无需费心优雅地取消订阅。弱引用允许订阅者完成向垃圾收集候选者的转变。当垃圾最终被收集时，集合中的条目将被删除。

虽然捆绑类没有直接提供这样的集合，但您可以通过几次调用来创建一个。

首先，我们首先利用 WeakHashMap 类。这显示在 Collections.newSetFromMap。

Set< YourClassGoesHere > weakHashSet = 
    Collections
    .newSetFromMap(
        new WeakHashMap< YourClassGoesHere , Boolean >()
    )
;

地图的值，布尔，在这里是不相关的，因为地图的键构成了我们的集合。

在发布-订阅这样的场景中，如果订阅者和发布者在不同的线程上操作（很可能是这种情况），我们就需要线程安全。

更进一步，将其包装为同步集，以使该集成为线程安全的。传入对 Collections.synchronizedSet。

this.subscribers =
        Collections.synchronizedSet(
                Collections.newSetFromMap(
                        new WeakHashMap <>()  // Parameterized types `< YourClassGoesHere , Boolean >` are inferred, no need to specify.
                )
        );

现在我们可以从生成的集合中添加和删除订阅者。垃圾收集执行后，任何“消失”的订阅者最终都会被自动删除。此执行何时发生取决于 JVM 的垃圾收集器实现，并且取决于当前的运行时情况。有关底层 WeakHashMap 何时以及如何清除过期条目的讨论和示例，请参阅此问题，*Is WeakHashMap不断增长，还是清除垃圾键？
*。

Concurrent set of weak references

Another twist is a thread-safe set of weak references.

Such a set is handy for tracking subscribers in a pub-sub scenario. When a subscriber is going out of scope in other places, and therefore headed towards becoming a candidate for garbage-collection, the subscriber need not be bothered with gracefully unsubscribing. The weak reference allows the subscriber to complete its transition to being a candidate for garbage-collection. When the garbage is eventually collected, the entry in the set is removed.

While no such set is directly provided with the bundled classes, you can create one with a few calls.

First we start with making a Set of weak references by leveraging the WeakHashMap class. This is shown in the class documentation for Collections.newSetFromMap.

Set< YourClassGoesHere > weakHashSet = 
    Collections
    .newSetFromMap(
        new WeakHashMap< YourClassGoesHere , Boolean >()
    )
;

The Value of the map, Boolean, is irrelevant here as the Key of the map makes up our Set.

In a scenario such as pub-sub, we need thread-safety if the subscribers and publishers are operating on separate threads (quite likely the case).

Go one step further by wrapping as a synchronized set to make this set thread-safe. Feed into a call to Collections.synchronizedSet.

this.subscribers =
        Collections.synchronizedSet(
                Collections.newSetFromMap(
                        new WeakHashMap <>()  // Parameterized types `< YourClassGoesHere , Boolean >` are inferred, no need to specify.
                )
        );

Now we can add and remove subscribers from our resulting Set. And any “disappearing” subscribers will eventually be automatically removed after garbage-collection executes. When this execution happens depends on your JVM’s garbage-collector implementation, and depends on the runtime situation at the moment. For discussion and example of when and how the underlying WeakHashMap clears the expired entries, see this Question, *Is WeakHashMap ever-growing, or does it clear out the garbage keys?
*.

回复收藏 0 原文

自找没趣 2024-12-01 15:27:05

CopyOnWriteArraySet 是一个非常简单的实现 - 它基本上有一个数组中的元素列表，当更改列表时，它会复制该数组。此时正在运行的迭代和其他访问继续使用旧数组，避免了读取器和写入器之间同步的必要性（尽管写入本身需要同步）。通常快速的设置操作（尤其是 contains()）在这里相当慢，因为将以线性时间搜索数组。

仅将其用于非常小的集合，这些集合将被经常读取（迭代）并且很少更改。（Swing 的侦听器集是一个示例，但这些并不是真正的集，无论如何都应该仅在 EDT 中使用。）
Collections.synchronizedSet 将简单地在每个集合周围包装一个同步块。原始集合的方法。您不应该直接访问原始集。这意味着该集合中的任何两个方法都不能同时执行（其中一个方法将阻塞，直到另一个完成） - 这是线程安全的，但如果多个线程正在使用该集合，则不会有并发性。如果使用迭代器，通常仍然需要进行外部同步，以避免在迭代器调用之间修改集合时出现 ConcurrentModificationExceptions。性能将类似于原始集的性能（但有一些同步开销，如果并发使用，则会阻塞）。

如果您只有低并发性，并且希望确保所有更改对其他线程立即可见，请使用此选项。
ConcurrentSkipListSet 是并发 SortedSet 实现，大多数基本操作的时间复杂度为 O(log n)。它允许并发添加/删除和读取/迭代，其中迭代可能会或可能不会告知自迭代器创建以来的更改。批量操作只是多个单一调用，并且不是原子完成的 - 其他线程可能只观察其中的一些操作。

显然，只有当你的元素有一定的总顺序时，你才能使用它。
对于高并发情况、不太大的集合（因为 O(log n)），这看起来是一个理想的选择。
对于ConcurrentHashMap（以及从中派生的Set）：这里最基本的选项是（平均而言，如果您有一个又好又快的hashCode()） O(1)（但当许多键具有相同的哈希码时可能会退化为 O(n)），例如 HashMap/HashSet。写入的并发性有限（表已分区，写入访问将在所需分区上同步），而读取访问本身和写入线程完全并发（但可能还看不到当前正在执行的更改的结果）书面）。迭代器自创建以来可能会或可能不会看到更改，并且批量操作不是原子的。
调整大小很慢（对于 HashMap/HashSet），因此您应该通过估计创建时所需的大小来尝试避免这种情况（并使用大约 1/3 以上的大小，因为它会在 3/4 满时调整大小）。

当您有大型集合、良好（且快速）的哈希函数并且可以在创建映射之前估计集合大小和所需的并发性时，请使用此功能。
这里还有其他可以使用的并发地图实现吗？

The CopyOnWriteArraySet is a quite simple implementation - it basically has a list of elements in an array, and when changing the list, it copies the array. Iterations and other accesses which are running at this time continue with the old array, avoiding necessity of synchronization between readers and writers (though writing itself needs to be synchronized). The normally fast set operations (especially contains()) are quite slow here, as the arrays will be searched in linear time.

Use this only for really small sets which will be read (iterated) often and changed seldom. (Swing's listener-sets would be an example, but these are not really sets, and should be only used from the EDT anyway.)
Collections.synchronizedSet will simply wrap a synchronized-block around each method of the original set. You should not access the original set directly. This means that no two methods of the set can be executed concurrently (one will block until the other finishes) - this is thread-safe, but you will not have concurrency if multiple threads are using the set. If you use the iterator, you usually still need to synchronize externally to avoid ConcurrentModificationExceptions when modifying the set between iterator calls. The performance will be like the performance of the original set (but with some synchronization overhead, and blocking if used concurrently).

Use this if you only have low concurrency, and want to be sure all changes are immediately visible to the other threads.
ConcurrentSkipListSet is the concurrent SortedSet implementation, with most basic operations in O(log n). It allows concurrent adding/removing and reading/iteration, where iteration may or may not tell about changes since the iterator was created. The bulk operations are simply multiple single calls, and not done atomically – other threads may observe only some of them.

Obviously you can use this only if you have some total order on your elements.
This looks like an ideal candidate for high-concurrency situations, for not-too-large sets (because of the O(log n)).
For the ConcurrentHashMap (and the Set derived from it): Here most basic options are (on average, if you have a good and fast hashCode()) in O(1) (but might degenerate to O(n) when many keys have the same hash code), like for HashMap/HashSet. There is a limited concurrency for writing (the table is partitioned, and write access will be synchronized on the needed partition), while read access is fully concurrent to itself and the writing threads (but might not yet see the results of the changes currently being written). The iterator may or may not see changes since it was created, and bulk operations are not atomic.
Resizing is slow (as for HashMap/HashSet), thus you should try to avoid this by estimating the needed size on creation (and using about 1/3 more of that, as it resizes when 3/4 full).

Use this when you have large sets, a good (and fast) hash function and can estimate the set size and needed concurrency before creating the map.
Are there other concurrent map implementations one could use here?