java中大型列表的最佳列表实现是什么
我必须创建一个包含 n 个元素的大列表(最多可达 100,000 个)。列表中的每个元素都是一个相当于列表索引的整数。之后我必须在此列表上调用 Collections.shuffle 。我的问题是,应该使用哪种列表实现(java 集合或 apache 集合)。我的直觉是 ArrayList 可以很好地用在这里。 所有的想法都受到赞赏。 谢谢!
感谢您的投入。我想我还是坚持使用ArrayList。我目前正在使用带有initialCapacity 参数的ArrayList 构造函数,并传递列表的大小。因此,如果原始列表是 100000,我使用 new ArrayList(100000) 创建这个新列表;因此,我认为我没有创建一个数组并执行一个 asList,因为不会有任何大小调整。此外,大多数 apache 集合列表如 GrowthList 和LazyList 不实现 RandomAccess。这肯定会减慢洗牌速度(根据 javadocs)。 FastArrayList 确实实现了 RandomAccess,但 apache 对此类有一个注释,指出“此类不是跨平台的。使用它可能会在某些体系结构上导致意外失败”。
I have to create a large list of n elements (could be up to 100,000). each element in the list is an integer equivalent to the index of the list. After this I have to call Collections.shuffle on this list. My question is, which list implementation (either java collections or apache collections) should be used. My gut feeling is ArrayList can well be used here.
All thoughts are appreciated.
Thanks!
Thanks for the inputs. I think I am sticking to the ArrayList. I am currently using the ArrayList constructor with the initialCapacity param and I pass the size of the list. So if the original list is 100000, I create this new list with new ArrayList(100000); Hence I think I don't have the create an array and do an asList since there won't be any resizing. Also, most of the apache collections Lists like GrowthList & LazyList do not implement RandomAccess. This for sure would slow down the shuffle (as per javadocs). FastArrayList does implement RandomAccess but apache has a note for this class saying "This class is not cross-platform. Using it may cause unexpected failures on some architectures".
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(10)
ArrayList 很可能每个列表元素的开销最少,因此应该是最佳选择。如果您经常需要删除列表中间的项目,这可能是一个更糟糕的选择。
ArrayList most probably has the least overhead per list element, so should be the best choice. It might be a worse choice if you frequently need to delete items in the middle of the list.
引用自 Collections.shuffle javadoc:
因此,如果您没有其他需求,我会选择实现 RandomAccess 的 ArrayList。
Quoted from the Collections.shuffle javadoc:
So if you have no other needs i would go with ArrayList which implements RandomAccess.
创建一个 Integer 数组,然后用
Arrays.asList
为您提供的开销甚至比常规ArrayList
还要少。您节省了整个
int
的空间(……诚然,在这种情况下这绝对算不了什么),但它执行的范围检查确实比“真正的”ArrayList 少,所以访问会稍微快一些。不过,你可能不会注意到什么:)Making an
Integer
array and then wrapping it withArrays.asList
gives you even less overhead than a regularArrayList
.You save one entire
int
worth of space (... which admittedly is absolutely nothing in this context), but it does perform fewer range checks than the "real"ArrayList
, so accessing will be slightly faster. Probably nothing you'll notice, though :)ArrayList
可能没问题,是的 - 但是您使用什么标准来衡量“最佳”?无论如何,它必须有多好?无论这些标准是什么,您在复杂性和“善良”之间的权衡是什么?ArrayList<T>
would probably be fine, yes - but what criteria are you using for "best" anyway? And just how good does it have to be anyway? What are your trade-offs between complexity and "goodness" in whatever those criteria are?Javolution 声称拥有 Java 中最快的 List 实现。但我在这个库中找不到任何随机播放实现,因此您必须手动完成。
Javolution claims to have the fastest List implementation in java. But I couldn't find any shuffle implementation in this library so you will have to do it by hand.
Google 的 Guava 库有一些非常好的原始处理,包括 Ints.asList() 方法返回一个可以打乱顺序的列表。
Guava 项目仍处于部署的初步阶段,尽管代码已经过仔细审查并在 Google 大量使用。您需要从 SVN 检索代码并构建 com .google.common.primitive 类。
Google's Guava library has some really nice primitive handling, including an Ints.asList() method returning a list that may be shuffled.
The Guava project is still at a preliminary stage of deployment, though the code has been carefully reviewed and heavily used at Google. You'll need to retrieve the code from SVN and build the com.google.common.primitive classes.
这是关于您对有关
FastArrayList
的问题的更新。FastArrayList
类(javadoc) 是一个并发列表类。 javadoc 是这样说的:现在您的用例(如所描述的)是单线程的。因此:
简而言之,
FastArrayList
中的“快速”是相对于(比如说)这样做的:回到你原来的问题。
ArrayList
是最简单的快速方法,我怀疑任何其他List
类都能打败它。但是,以下方法可能更快。为什么这样会更快?因为数组上的交换操作比 ArrayList 上的交换操作要快。
整体会更快吗?很难说。这取决于:
asList
包装器上的列表操作的性能是否与ArrayList
相比......以及您执行的操作等。我的建议是要小心“过早优化”。
This is about your update to your question concerning
FastArrayList
.The
FastArrayList
class (javadoc) is a concurrent list class. This is what the javadoc says:Now your use-case (as described) is single-threaded. So:
ArrayList
.In short, the "fast" in
FastArrayList
is relative to (say) doing this:Back to your original question.
ArrayList
is the simplest of the fast ways, and I doubt that any otherList
class will beat it. However, the following approach may be faster.Why might that be faster? Because swap operations on an array will be faster than on an
ArrayList
.Will it be faster overall? Hard to say. It depends on:
asList
wrapper compared to anArrayList
... and what operations you perform, etc.My advice would be to beware of "premature optimization".
有一个名为 GlueList 的新 List 实现,它比 ArrayList 和 LinkedList 非常快。
免责声明:我已经创建了实现。
There is a new List implementation called GlueList which is very fast than ArrayList and LinkedList.
Disclaimer: I've created the implementation.
ArrayList 将是最好的列表。因为数组支持对于交换洗牌中使用的元素非常有效。
但是,如果您确实追求性能,您可能希望考虑使用 int[] 或基于 int[] 的自定义列表,就像所有 List 和 List 标准实现一样,您将把 int 装箱和拆箱为 Integers。
这不会是 suffle 上的问题,因为这只是重新排序指针,但您可能会在不需要时创建 100,000 个对象。假设您在创建之前知道列表的大小,您可以很容易地创建一个包装原始数组的新 List 类。如果用作 java.util.List,您仍然需要对任何 get 方法的返回进行装箱。
ArrayList will be the best List for this. As the array backing will be very efficent for swapping elements used in shuffle.
But if you are really chacing performance you may wish to consider using an int[] or a custom list based in int[] as with all the List and List standard implementations you will be boxing and unboxing ints to Integers.
This will not be an issue on the suffle as this will just be reordering pointers but you will be creating 100,000 objects when you may not need to. Assuming you know the size of your list before creation you can quite easily create a new List class that wraps a primitive array. If used as a java.util.List you will still need to box the return from any get method.
您还可以使用基于内存映射文件的列表实现。在这种实现中,列表并不完全存在于内存中,而是只有大列表的一部分在内存中处于活动状态。
如果达到堆空间限制(主要在 32 位 jvm 中),您可能需要使用内存映射文件使列表无缝推送数据,这比普通文件 I/O 更快。
此 Google 代码 中介绍了一种此类实现,并在 链接。
You can also use memory mapped file based list implementation . In such implementation the list is not fully present in memory but only a section of the huge list will be active in memory.
If you are reaching heap space limitation (mostly in 32 bit jvm) you may be needing to make the list push off data seamlessly using a memory mapped file which will be faster than normal file I/O.
One such implementation is described in this google code and explained in this link.