我手头有一个绩效情况。
我有大量数据以二维表格式(12000 X 2000)保存在内存中。现在,据我所知,我可以使用 int[][]
或 List>
。当然,我使用 int[i][j]
或 list.get(i).get(j)
访问值。我将整个数据循环至少五次。
您认为哪一种效果更快?如果您能回答,为什么?还有什么办法可以加快执行速度吗?
我的 java -version
给出:
java 版本“1.6.0_29”
Java(TM) SE 运行时环境(内部版本 1.6.0_29-b11)
Java HotSpot(TM) 客户端 VM(版本 20.4-b02,混合模式,共享)
操作系统是Windows Vista。
I have a performance situation at hand.
I have a huge amount of data to be held in memory in a two dimensional table format (12000 X 2000). Now as far as my knowledge goes either I can use int[][]
or List<List<Integer>>
. And, of course, I access the values using int[i][j]
or list.get(i).get(j)
. I am looping through the entire data at least five times.
Which one do you think will work faster and, if you can answer, why? Also is there any way to speed up the execution?
My java -version
gives:
java version "1.6.0_29"
Java(TM) SE Runtime Environment (build 1.6.0_29-b11)
Java HotSpot(TM) Client VM (build 20.4-b02, mixed mode, sharing)
The OS is Windows Vista.
发布评论
评论(8)
阵列几乎肯定会更快。
使用 ArrayList 将使性能更加一致,因为它由实际数组支持。
编辑总结评论
对于这个用例,我相信数组会明显更快。它是否足够快重要是一个不同的问题,而且我对正在解决的实际问题了解不够,无法对此做出判断。
The array will almost certainly be faster.
Using an
ArrayList
will bring the performance more in-line since it's backed by an actual array.Edit to summarize comments
For this usecase I believe the arrays will be measurably faster. Whether it's faster enough to matter is a different issue, and I don't know enough about the actual problem being solved to make a judgement on that.
1) 对整个应用程序进行基准测试。不要假设您知道应用程序中的性能瓶颈在哪里。经验一次又一次地表明,人类在这方面通常很糟糕。在与生产相同的硬件和系统上执行此操作,否则您就是在浪费时间。
2) 不要忘记以 JIT 编译器启动您关心的代码的方式构建您的基准测试。在编译方法之前,通常需要对方法进行 10000 次迭代。对解释模式代码进行基准测试完全是浪费时间。
3) 在已解决最重要瓶颈的应用程序中,许多应用程序将处于性能状况由处理器 L1 高速缓存未命中数主导的状态。您可以将此视为您的应用程序经过合理调整的点。然而,您的算法可能仍然很糟糕,并且系统中可能仍然有大量您可以摆脱的繁忙工作。
4)假设你的算法并不糟糕,并且你没有可以摆脱的大量忙碌工作,如果数组/列表差异对你来说确实很重要,那么此时你将开始看到它在性能数字中。
5)大多数情况下,你会发现数组的一级缓存情况会比列表更好。然而,这是一般建议,不要误认为是实际的性能调整建议。生成您自己的性能数据并对其进行分析。
tl;dr version:阅读长版本。 tl;dr 在 Java 性能讨论中没有地位——这是微妙而复杂的东西,细微差别很重要。
1) Benchmark your application as a whole. Don't assume that you know where the perf bottlenecks in your application are. Experience shows again and again and again that humans generally suck at this. Do this on hardware and systems which are identical to production, or you're wasting your time.
2) Don't forget to structure your benchmark in such a way that the JIT compiler has kicked in for the code you care about. 10000 iterations of a method are typically needed before a method is compiled. Benchmarking interpreted-mode code is a total waste of time.
3) In an application where the most significant bottlenecks have been dealt with, many applications will be in a state where the performance profile is dominated by the number of processor L1 cache misses. You can regard this as being the point at which your application is reasonably well-tuned. Your algorithms may still suck however, and there may still be loads of busywork going on in the system that you can get rid of.
4) Assuming that your algorithms don't suck and that you have no major chunks of busywork that you can get rid of, if the array / List difference is truly significant for you then it's at this point that you'll start to see it in the perf numbers.
5) Under most circumstances, you will find that the L1 cache situation will be better for arrays than for lists. However, this is general advice, not to be mistaken for actual performance tuning advice. Generate your own perf numbers and analyse them.
tl;dr version: Read the long version. tl;dr has no place in Java performance discussion - this is subtle and complex stuff and the nuances matter.
如果列表实现了RandomAccess(例如ArrayList),它几乎不会导致任何性能下降。如果您使用
LinkedList
随机访问其成员可能会非常昂贵。列表给你带来了一个非常大的好处:它们可以自动增长。列表是一种集合,可以为您从一个集合复制到另一个集合(例如从地图到列表等)提供一定的好处。
因此,您的选择应该取决于您是否需要列表自动增长以及性能问题是否是对你来说确实非常重要。在大多数情况下,情况并非如此。
最后一句话。我认为N维数组和列表都不是最好的选择。如果您需要 N 维,其中 N>1 创建类并将其实例存储到一维数组或集合中。
If list implements
RandomAccess
(e.g.ArrayList
) it almost does not cause any performance degradation. If you are usingLinkedList
random access to its members can be very expensive.Lists bring you a very serious benefit: they can grow automatically. And lists are collections that gives you certain benefits in copying from one collection to other (e.g. from map to list etc.)
So you choice should depend on the fact whether you need your list to grow automatically and whether the performance issues are really very important for you. In most cases they are not.
And the last remark. I think that both N-dimensional arrays and list are not the best choice. If you need N dimensions where N>1 create class and store its instances into 1-dimensional array or collection.
...当然,int[][]也会使用更少的内存。如果可能的话,尝试使用byte[][]或short[][]来进一步减少内存使用。
假设 32 位架构,12000x2000 相当于 91MB。如果字节足够,则大小将为 1/4。此外,还可能有性能改进(取决于架构)。
...of course, the int[][] will use less memory too. If possible, try using byte[][] or short[][] to further reduce memory usage.
Assuming a 32-bit architecture, 12000x2000 equates to 91MB. If bytes are sufficient, then it will be 1/4 the size. Furthermore, there may be performance improvements as well (architecture-dependent).
这取决于您使用的
List
实现。如果您使用 ArrayList(大多数人使用的),那么性能基本上与数组相同。但如果您使用LinkedList
,那么性能会明显变差,因为LinkedList
在随机访问时非常慢。创建数据时,如果您使用 ArrayList,则应通过将数字传递到构造函数来初始化其内部数组的大小。否则,初始化
ArrayList
将比初始化数组慢得多。这是因为,当 ArrayList 的内部数组空间不足时,ArrayList 会创建一个更大的新数组。然后它将旧数组中的所有元素复制到新数组中。这会导致显着的性能损失。It depends on the
List
implementation you are using. If you are using anArrayList
(the one most people use), then performance is going to be essentially identical to an array. But if you are using aLinkedList
, then performance will be significantly worse becauseLinkedLists
are very slow when it comes to random access.When you are creating the data, if you are using an
ArrayList
, you should initialize the size of its internal array by passing a number into the constructor. Otherwise, initializing theArrayList
will be significantly slower than initializing an array. This is because, when theArrayList
's internal array runs out of space, theArrayList
creates a new, larger array. It then copies all the elements from the old array into the new array. This results in significant performance loss.这是一个简单的基准测试,显示原始数组要快得多。
不过,装箱的成本会让阵列变慢。
结果:
代码:
Here is a simple benchmark that shows the primitive arrays to be much faster.
The cost of boxing will make arrays slower though.
Results:
Code:
我认为二维数组在大多数情况下会更快,但为什么不在您的具体问题上测试它呢?
I think two-dimensional array will be faster in most cases, but why don't you test it on your specific problem?
这里有对此进行了广泛的讨论:
Java 中的数组或列表。哪个更快?
这是基准结论:
There is an extensive discussion on this here:
Array or List in Java. Which is faster?
Here is the benchmark conclusion: