Java 集合包含所有奇怪的行为
我有以下代码,其中我使用 superList 和 subList ,我想检查 subList 实际上是 superList 的子列表。
我的对象没有实现 hashCode 或 equals 方法。我在测试中也创造了类似的情况。当我运行测试时,结果显示 JDK 集合和普通集合的结果之间存在很大的性能差异。运行测试后,我得到以下输出。
Java Collection API 8953 毫秒的时间流逝结果为真 Commons Collection API 的时间流逝 78 毫秒 &结果是 true
我的问题是为什么 java collection 在处理 containsAll 操作时如此缓慢。我在那里做错了什么吗?我无法控制我从遗留代码中获取的集合类型。我知道如果我使用 HashSet 作为 superList,那么使用 JDK containsAll 操作我会获得很大的性能提升,但不幸的是这对我来说是不可能的。
package com.mycompany.tests;
import java.util.ArrayList;
import java.util.Collection;
import java.util.HashSet;
import org.apache.commons.collections.CollectionUtils;
import org.junit.Before;
import org.junit.Test;
public class CollectionComparison_UnitTest {
private Collection<MyClass> superList = new ArrayList<MyClass>();
private Collection<MyClass> subList = new HashSet<MyClass>(50000);
@Before
public void setUp() throws Exception {
for (int i = 0; i < 50000; i++) {
MyClass myClass = new MyClass(i + "A String");
superList.add(myClass);
subList.add(myClass);
}
@Test
public void testIt() {
long startTime = System.currentTimeMillis();
boolean isSubList = superList.containsAll(subList);
System.out.println("Time Lapsed with Java Collection API "
+ (System.currentTimeMillis() - startTime)
+ " MilliSeconds & Result is " + isSubList);
startTime = System.currentTimeMillis();
isSubList = CollectionUtils.isSubCollection(subList, superList);
System.out.println("Time Lapsed with Commons Collection API "
+ (System.currentTimeMillis() - startTime)
+ " MilliSeconds & Result is " + isSubList);
}
}
class MyClass {
String myString;
MyClass(String myString) {
this.myString = myString;
}
String getMyString() {
return myString;
}
}
I have following code , where I am using superList and subList , I want to check that subList is actually a subList of superList.
My objects do not implement hashCode or equals methods. I have created the similar situation in the test. When I run the test then the result show very big performance difference between results from JDK collection and common collections.After Running the test I am getting following output.
Time Lapsed with Java Collection API 8953 MilliSeconds & Result is true
Time Lapsed with Commons Collection API 78 MilliSeconds & Result is true
My question is why is java collection , so slow in processing the containsAll operation. Am I doing something wrong there? I have no control over collection Types I am getting that from legacy code. I know if I use HashSet for superList then I would get big performance gains using JDK containsAll operation, but unfortunately that is not possible for me.
package com.mycompany.tests;
import java.util.ArrayList;
import java.util.Collection;
import java.util.HashSet;
import org.apache.commons.collections.CollectionUtils;
import org.junit.Before;
import org.junit.Test;
public class CollectionComparison_UnitTest {
private Collection<MyClass> superList = new ArrayList<MyClass>();
private Collection<MyClass> subList = new HashSet<MyClass>(50000);
@Before
public void setUp() throws Exception {
for (int i = 0; i < 50000; i++) {
MyClass myClass = new MyClass(i + "A String");
superList.add(myClass);
subList.add(myClass);
}
@Test
public void testIt() {
long startTime = System.currentTimeMillis();
boolean isSubList = superList.containsAll(subList);
System.out.println("Time Lapsed with Java Collection API "
+ (System.currentTimeMillis() - startTime)
+ " MilliSeconds & Result is " + isSubList);
startTime = System.currentTimeMillis();
isSubList = CollectionUtils.isSubCollection(subList, superList);
System.out.println("Time Lapsed with Commons Collection API "
+ (System.currentTimeMillis() - startTime)
+ " MilliSeconds & Result is " + isSubList);
}
}
class MyClass {
String myString;
MyClass(String myString) {
this.myString = myString;
}
String getMyString() {
return myString;
}
}
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
您至少应该以相反的顺序尝试测试。您的结果很可能只是表明 JIT 编译器做得很好:-)
You should at least try the tests in the opposite order. Your results may very well just show that the JIT compiler is doing its job well :-)
不同的算法:
ArrayList.containsAll()
提供O(N*N),而CollectionUtils.isSubCollection()
提供O(N) +N+N)。Different algorithms:
ArrayList.containsAll()
offers O(N*N), whileCollectionUtils.isSubCollection()
offers O(N+N+N).ArrayList.containsAll
继承自AbstractCollection.containsAll
,是一个检查行中所有元素的简单循环。每一步都是缓慢的线性搜索。我不知道 CollectionUtils 是如何工作的,但是比使用简单循环更快地完成它并不难。将第二个列表转换为HashSet
是肯定的胜利。对两个列表进行排序并并行浏览它们可能会更好。编辑:
CollectionUtils 源代码 说得很清楚。他们将这两个集合转换为“基数映射”,这对于许多操作来说是一种简单而通用的方法。在某些情况下,这可能不是一个好主意,例如,当第一个列表为空或非常短时,您实际上会浪费时间。在你的情况下,与 AbstractCollection.containsAll 相比,这是一个巨大的胜利,但你可以做得更好。
几年后的附录
OP 写道
那是错误的。没有
hashCode
和equals
的类从Object
继承它们,并且可以与HashSet
一起使用 > 一切都很完美。除了每个对象都是唯一的,这可能是无意的和令人惊讶的,但OP的测试superList.containsAll(subList)
做了完全相同的事情。所以快速的解决方案是
ArrayList.containsAll
is inherited fromAbstractCollection.containsAll
and is a simple loop checking all elements in row. Each step is a slow linear search. I don't know howCollectionUtils
works, but it's not hard to do it much faster then using the simple loop. Converting the second List to aHashSet
is a sure win. Sorting both lists and going through them in parallel could be even better.EDIT:
The CollectionUtils source code makes it clear. They're converting both collections to "cardinality maps", which is a simple and general way for many operations. In some cases it may not be a good idea, e.g., when the first list is empty or very short, you in fact loose time. In you case it's a huge win in comparison to AbstractCollection.containsAll, but you could do even better.
Addendum years later
The OP wrote
and that's wrong. Classes without
hashCode
andequals
inherit them fromObject
and can be used with aHashSet
and everything works perfectly. Except for that each object is unique, which may be unintended and surprising, but the OP's testsuperList.containsAll(subList)
does exactly the same thing.So the quick solutions would be