为什么 it.next() 会抛出 java.util.ConcurrentModificationException?

发布于 2024-12-01 22:08:55 字数 493 浏览 4 评论 0 原文

final Multimap<Term, BooleanClause> terms = getTerms(bq);
        for (Term t : terms.keySet()) {
            Collection<BooleanClause> C = new HashSet(terms.get(t));
            if (!C.isEmpty()) {
                for (Iterator<BooleanClause> it = C.iterator(); it.hasNext();) {
                    BooleanClause c = it.next();
                    if(c.isSomething()) C.remove(c);
                }
            }
        }

不是 SSCCE,但你能闻到气味吗?

final Multimap<Term, BooleanClause> terms = getTerms(bq);
        for (Term t : terms.keySet()) {
            Collection<BooleanClause> C = new HashSet(terms.get(t));
            if (!C.isEmpty()) {
                for (Iterator<BooleanClause> it = C.iterator(); it.hasNext();) {
                    BooleanClause c = it.next();
                    if(c.isSomething()) C.remove(c);
                }
            }
        }

Not a SSCCE, but can you pick up the smell?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

你好,陌生人 2024-12-08 22:08:55

HashSet 类的 Iterator 是一个快速失败迭代器。来自 HashSet 类:

此类的迭代器方法返回的迭代器是快速失败的:
如果在创建迭代器后随时修改集合,则
除了通过迭代器自己的删除方法之外的任何方式,迭代器
抛出 ConcurrentModificationException。于是,面对
并发修改,迭代器快速而干净地失败,
而不是冒着任意、非确定性行为的风险
未来的时间不确定。

请注意,无法保证迭代器的快速失败行为
因为一般来说,不可能做出任何硬性保证
存在不同步的并发修改。快速失败
迭代器尽力抛出 ConcurrentModificationException
基础。因此,编写依赖于
关于此异常的正确性:快速失败行为
迭代器应该仅用于检测错误。

请注意最后一句 - 您捕获 ConcurrentModificationException 的事实意味着另一个线程正在修改集合。同一个 Javadoc API 页面还指出:

如果多个线程同时访问一个哈希集,并且至少一个
线程修改集合,它必须在外部同步。
这通常是通过同步某个对象来完成的
自然地封装了集合。如果不存在这样的对象,则设置
应该使用 Collections.synchronizedSet 方法。这
最好在创建时完成,以防止意外不同步
访问该集:

Set s = Collections.synchronizedSet(new HashSet(...));

我相信对 Javadoc 的引用对于下一步应该做什么是不言自明的。

此外,就您而言,我不明白您为什么不使用 ImmutableSet,而不是在 terms 对象上创建 HashSet(可能会在此期间进行修改;我看不到 getTerms 方法的实现,但我有预感底层键集正在被修改)。创建不可变集将允许当前线程拥有原始键集的自己的防御副本。

请注意,虽然可以通过使用同步集来防止 ConcurrentModificationException(如 Java API 文档中所述),但前提条件是所有线程都必须访问同步集合而不是直接访问后备集合(在您的情况下可能不正确,因为 HashSet 可能是在一个线程中创建的,而 MultiMap 的底层集合是由其他线程修改的)。同步集合类实际上维护一个内部互斥锁,供线程获取访问权限;由于您无法直接从其他线程访问互斥体(在这里这样做是非常荒谬的),因此您应该考虑使用键集或 MultiMap 本身的防御性副本 使用 MultiMapsunmodifyingMultimap 方法(您需要从 getTerms 方法返回一个不可修改的 MultiMap)。您还可以调查返回 同步 MultiMap,但话又说回来,您需要确保互斥锁必须由任何线程来保护底层集合免受并发修改。

请注意,我故意省略了使用 线程安全 HashSet 的唯一原因是我不确定是否能确保对实际集合的并发访问;但情况很可能并非如此。


编辑:在单线程场景中 Iterator.next 上抛出 ConcurrentModificationException

这是关于语句: if(c .isSomething()) C.remove(c); 这是在编辑的问题中引入的。

调用 Collection.remove 会改变问题的性质,因为现在即使在单线程场景中也可能抛出 ConcurrentModificationException 。

这种可能性是由于方法本身的使用以及Collection的迭代器的使用而产生的,在本例中是使用语句初始化的变量it : 迭代器它 = C.iterator();

遍历 Collection CIterator it 存储与 Collection< 的当前状态相关的状态/代码>。在这种特殊情况下(假设是 Sun/Oracle JRE),KeyIteratorHashMap 类的内部内部类,由 HashSet 使用) >) 用于迭代Collection。此 Iterator 的一个特殊特征是,它通过它的 Collection(在本例中为 HashMap)跟踪对 Collection 执行的结构修改的数量。 code>Iterator.remove 方法。

当您直接在 Collection 上调用 remove,然后调用 Iterator.next 时,迭代器会抛出一个 ConcurrentModificationException,因为 Iterator.next 验证是否发生了 Iterator 不知道的任何 Collection 结构修改。在这种情况下,Collection.remove 会导致结构修改,该修改由 Collection 跟踪,但不由 Iterator 跟踪。

要解决这部分问题,您必须调用 Iterator.remove 而不是 Collection.remove,因为这可确保 Iterator 现在可以识别对 Collection 的修改。在这种情况下,Iterator 将跟踪通过 remove 方法发生的结构修改。因此,您的代码应如下所示:

final Multimap<Term, BooleanClause> terms = getTerms(bq);
        for (Term t : terms.keySet()) {
            Collection<BooleanClause> C = new HashSet(terms.get(t));
            if (!C.isEmpty()) {
                for (Iterator<BooleanClause> it = C.iterator(); it.hasNext();) {
                    BooleanClause c = it.next();
                    if(c.isSomething()) it.remove(); // <-- invoke remove on the Iterator. Removes the element returned by it.next.
                }
            }
        }

The Iterator for the HashSet class is a fail-fast iterator. From the documentation of the HashSet class:

The iterators returned by this class's iterator method are fail-fast:
if the set is modified at any time after the iterator is created, in
any way except through the iterator's own remove method, the Iterator
throws a ConcurrentModificationException. Thus, in the face of
concurrent modification, the iterator fails quickly and cleanly,
rather than risking arbitrary, non-deterministic behavior at an
undetermined time in the future.

Note that the fail-fast behavior of an iterator cannot be guaranteed
as it is, generally speaking, impossible to make any hard guarantees
in the presence of unsynchronized concurrent modification. Fail-fast
iterators throw ConcurrentModificationException on a best-effort
basis. Therefore, it would be wrong to write a program that depended
on this exception for its correctness: the fail-fast behavior of
iterators should be used only to detect bugs.

Note the last sentence - the fact that you are catching a ConcurrentModificationException implies that another thread is modifying the collection. The same Javadoc API page also states:

If multiple threads access a hash set concurrently, and at least one
of the threads modifies the set, it must be synchronized externally.
This is typically accomplished by synchronizing on some object that
naturally encapsulates the set. If no such object exists, the set
should be "wrapped" using the Collections.synchronizedSet method. This
is best done at creation time, to prevent accidental unsynchronized
access to the set:

Set s = Collections.synchronizedSet(new HashSet(...));

I believe the references to the Javadoc are self explanatory in what ought to be done next.

Additionally, in your case, I do not see why you are not using the ImmutableSet, instead of creating a HashSet on the terms object (which could possibly be modified in the interim; I cannot see the implementation of the getTerms method, but I have a hunch that the underlying keyset is being modified). Creating a immutable set will allow the current thread to have it's own defensive copy of the original key-set.

Note, that although a ConcurrentModificationException can be prevented by using a synchronized Set (as noted in the Java API documentation), it is a prerequisite that all threads access the synchronized collection and not the backing collection directly (which might be untrue in your case as the HashSet is probably created in one thread, while the underlying collection for the MultiMap is modified by other threads). The synchronized collection classes actually maintain an internal mutex for threads to acquire access to; since you cannot access the mutex directly from other threads (and it would be quite ridiculous to do so here), you ought to look at using a defensive copy of either the keyset or of the MultiMap itself using the unmodifiableMultimap method of the MultiMaps class (you'll need to return an unmodifiable MultiMap from the getTerms method). You could also investigate the necessity of returning a synchronized MultiMap, but then again, you'll need to ensure that the mutex must be acquired by any thread to protect the underlying collection from concurrent modifications.

Note, I have deliberately omitted mentioning the use of a thread-safe HashSet for the sole reason that I'm unsure of whether concurrent access to the actual collection will be ensured; it most likely will not be the case.


Edit: ConcurrentModificationExceptions thrown on Iterator.next in a single-threaded scenario

This is with respect to the statement: if(c.isSomething()) C.remove(c); that was introduced in the edited question.

Invoking Collection.remove changes the nature of the question, for it now becomes possible to have ConcurrentModificationExceptions thrown even in a single-threaded scenario.

The possibility arises out of the use of the method itself, in conjunction with the use of the Collection's iterator, in this case the variable it that was initialized using the statement : Iterator<BooleanClause> it = C.iterator();.

The Iterator it that iterates over Collection C stores state pertinent to the current state of the Collection. In this particular case (assuming a Sun/Oracle JRE), a KeyIterator (an internal inner class of the HashMap class that is used by the HashSet) is used to iterate through the Collection. A particular characteristic of this Iterator is that it tracks the number of structural modifications performed on the Collection (the HashMap in this case) via it's Iterator.remove method.

When you invoke remove on the Collection directly, and then follow it up with an invocation of Iterator.next, the iterator throws a ConcurrentModificationException, as Iterator.next verifies whether any structural modifications of the Collection have occurred that the Iterator is unaware of. In this case, Collection.remove causes a structural modification, that is tracked by the Collection, but not by the Iterator.

To overcome this part of the problem, you must invoke Iterator.remove and not Collection.remove, for this ensures that the Iterator is now aware of the modification to the Collection. The Iterator in this case, will track the structural modification occurring through the remove method. Your code should therefore look like the following:

final Multimap<Term, BooleanClause> terms = getTerms(bq);
        for (Term t : terms.keySet()) {
            Collection<BooleanClause> C = new HashSet(terms.get(t));
            if (!C.isEmpty()) {
                for (Iterator<BooleanClause> it = C.iterator(); it.hasNext();) {
                    BooleanClause c = it.next();
                    if(c.isSomething()) it.remove(); // <-- invoke remove on the Iterator. Removes the element returned by it.next.
                }
            }
        }
清风无影 2024-12-08 22:08:55

原因是您正在尝试修改迭代器外部的集合。

它是如何工作的:

当您创建迭代器时,集合会独立地为集合和迭代器维护一个modificationNum变量。
1. 对于集合和迭代器所做的每次更改,集合变量都会递增。
2. 每次对迭代器进行更改时,迭代器的变量都会递增。

因此,当您通过迭代器调用 it.remove() 时,修改的值都会增加-number-variable 减 1。

但是,当您直接对集合调用collection.remove() 时,只会增加集合的modification-numbervariable 的值,而不增加迭代器的变量。

规则是:只要迭代器的修改号值与原始集合修改号值不匹配,就会给出 ConcurrentModificationException。

The reason is that you are trying to modify the collection outside iterator.

How it works :

When you create an iterator the collection maintains a modificationNum-variable for both the collection and the iterator independently.
1. The variable for collection is being incremented for each change made to the collection and and iterator.
2. The variable for iterator is being incremented for each change made to the iterator.

So when you call it.remove() through iterator that increases the value of both the modification-number-variable by 1.

But again when you call collection.remove() on collection directly, that increments only the value of the modification-numbervariable for the collection, but not the variable for the iterator.

And rule is : whenever the modification-number value for the iterator does not match with the original collection modification-number value, it gives ConcurrentModificationException.

沙沙粒小 2024-12-08 22:08:55

Vineet Reynolds 详细解释了集合抛出 ConcurrentModificationException 的原因(线程安全、并发)。 Swagatika 非常详细地解释了该机制的实现细节(集合和迭代器如何统计修改次数)。

他们的回答很有趣,我给他们投了赞成票。但是,就您而言,问题并非来自并发(您只有一个线程),并且实现细节虽然有趣,但不应在此考虑。

您应该只考虑 HashSet javadoc 的这一部分:

此类的迭代器方法返回的迭代器是快速失败的:
如果在创建迭代器后随时修改集合,则
除了通过迭代器自己的删除方法之外的任何方式,迭代器
抛出 ConcurrentModificationException。于是,面对
并发修改,迭代器快速而干净地失败,
而不是冒着任意、非确定性行为的风险
未来的时间不确定。

在代码中,您使用 HashSet 的迭代器对其进行迭代,但使用 HashSet 自己的删除方法来删除元素 ( C.remove(c) ),这会导致 ConcurrentModificationException。相反,如 javadoc 中所述,您应该使用 Iterator 自己的remove() 方法,该方法从底层集合中删除当前正在迭代的元素。

替换

                if(c.isSomething()) C.remove(c);

                if(c.isSomething()) it.remove();

如果您想使用更实用的方法,您可以创建一个 谓词并使用Guava的Iterables.removeIf() 方法:

Predicate<BooleanClause> ignoredBooleanClausePredicate = ...;
Multimap<Term, BooleanClause> terms = getTerms(bq);
for (Term term : terms.keySet()) {
    Collection<BooleanClause> booleanClauses = Sets.newHashSet(terms.get(term));
    Iterables.removeIf(booleanClauses, ignoredBooleanClausePredicate);
}

PS:请注意,在在这种情况下,这只会从临时 HashSet 中删除元素。 Multimap 不会被修改。

Vineet Reynolds has explained in great details the reasons why collections throw a ConcurrentModificationException (thread-safety, concurrency). Swagatika has explained in great details the implementation details of this mechanism (how collection and iterator keep count of the number of modifications).

Their answers were interesting, and I upvoted them. But, in your case, the problem does not come from concurrency (you have only one thread), and implementation details, while interesting, should not be considered here.

You should only consider this part of the HashSet javadoc:

The iterators returned by this class's iterator method are fail-fast:
if the set is modified at any time after the iterator is created, in
any way except through the iterator's own remove method, the Iterator
throws a ConcurrentModificationException. Thus, in the face of
concurrent modification, the iterator fails quickly and cleanly,
rather than risking arbitrary, non-deterministic behavior at an
undetermined time in the future.

In your code, you iterate over your HashSet using its iterator, but you use the HashSet's own remove method to remove elements ( C.remove(c) ), which causes the ConcurrentModificationException. Instead, as explained in the javadoc, you should use the Iterator's own remove() method, which removes the element being currently iterated from the underlying collection.

Replace

                if(c.isSomething()) C.remove(c);

with

                if(c.isSomething()) it.remove();

If you want to use a more functional approach, you could create a Predicate and use Guava's Iterables.removeIf() method on the HashSet:

Predicate<BooleanClause> ignoredBooleanClausePredicate = ...;
Multimap<Term, BooleanClause> terms = getTerms(bq);
for (Term term : terms.keySet()) {
    Collection<BooleanClause> booleanClauses = Sets.newHashSet(terms.get(term));
    Iterables.removeIf(booleanClauses, ignoredBooleanClausePredicate);
}

PS: note that in both cases, this will only remove elements from the temporary HashSet. The Multimap won't be modified.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文