java中大型列表的最佳列表实现是什么

发布于 2024-08-11 20:00:42 字数 527 浏览 3 评论 0原文

我必须创建一个包含 n 个元素的大列表(最多可达 100,000 个)。列表中的每个元素都是一个相当于列表索引的整数。之后我必须在此列表上调用 Collections.shuffle 。我的问题是,应该使用哪种列表实现(java 集合或 apache 集合)。我的直觉是 ArrayList 可以很好地用在这里。 所有的想法都受到赞赏。 谢谢!

感谢您的投入。我想我还是坚持使用ArrayList。我目前正在使用带有initialCapacity 参数的ArrayList 构造函数,并传递列表的大小。因此,如果原始列表是 100000,我使用 new ArrayList(100000) 创建这个新列表;因此,我认为我没有创建一个数组并执行一个 asList,因为不会有任何大小调整。此外,大多数 apache 集合列表如 GrowthList 和LazyList 不实现 RandomAccess。这肯定会减慢洗牌速度(根据 javadocs)。 FastArrayList 确实实现了 RandomAccess,但 apache 对此类有一个注释,指出“此类不是跨平台的。使用它可能会在某些体系结构上导致意外失败”。

I have to create a large list of n elements (could be up to 100,000). each element in the list is an integer equivalent to the index of the list. After this I have to call Collections.shuffle on this list. My question is, which list implementation (either java collections or apache collections) should be used. My gut feeling is ArrayList can well be used here.
All thoughts are appreciated.
Thanks!

Thanks for the inputs. I think I am sticking to the ArrayList. I am currently using the ArrayList constructor with the initialCapacity param and I pass the size of the list. So if the original list is 100000, I create this new list with new ArrayList(100000); Hence I think I don't have the create an array and do an asList since there won't be any resizing. Also, most of the apache collections Lists like GrowthList & LazyList do not implement RandomAccess. This for sure would slow down the shuffle (as per javadocs). FastArrayList does implement RandomAccess but apache has a note for this class saying "This class is not cross-platform. Using it may cause unexpected failures on some architectures".

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(10

面如桃花 2024-08-18 20:00:42

ArrayList 很可能每个列表元素的开销最少,因此应该是最佳选择。如果您经常需要删除列表中间的项目,这可能是一个更糟糕的选择。

ArrayList most probably has the least overhead per list element, so should be the best choice. It might be a worse choice if you frequently need to delete items in the middle of the list.

幸福%小乖 2024-08-18 20:00:42

引用自 Collections.shuffle javadoc:

此方法以线性时间运行。如果指定的列表未实现 RandomAccess 接口并且很大,则此实现会在打乱指定列表之前将其转储到数组中,并将打乱后的数组转储回列表中。这避免了因对“顺序访问”列表进行改组而导致的二次行为。

因此,如果您没有其他需求,我会选择实现 RandomAccess 的 ArrayList。

Quoted from the Collections.shuffle javadoc:

This method runs in linear time. If the specified list does not implement the RandomAccess interface and is large, this implementation dumps the specified list into an array before shuffling it, and dumps the shuffled array back into the list. This avoids the quadratic behavior that would result from shuffling a "sequential access" list in place.

So if you have no other needs i would go with ArrayList which implements RandomAccess.

月野兔 2024-08-18 20:00:42

创建一个 Integer 数组,然后用 Arrays.asList 为您提供的开销甚至比常规 ArrayList 还要少。

List<Integer> makeList(int size){
    if (size < 0) throw new IllegalArgumentException();
    Integer[] arr = new Integer[size];
    for (int i = 0; i < arr.length; ++i) arr[i] = i;
    List<Integer> list = Arrays.asList(arr);
    Collection.shuffle(list);
    return list;
}

您节省了整个 int 的空间(……诚然,在这种情况下这绝对算不了什么),但它执行的范围检查确实比“真正的”ArrayList 少,所以访问会稍微快一些。不过,你可能不会注意到什么:)

Making an Integer array and then wrapping it with Arrays.asList gives you even less overhead than a regular ArrayList.

List<Integer> makeList(int size){
    if (size < 0) throw new IllegalArgumentException();
    Integer[] arr = new Integer[size];
    for (int i = 0; i < arr.length; ++i) arr[i] = i;
    List<Integer> list = Arrays.asList(arr);
    Collection.shuffle(list);
    return list;
}

You save one entire int worth of space (... which admittedly is absolutely nothing in this context), but it does perform fewer range checks than the "real" ArrayList, so accessing will be slightly faster. Probably nothing you'll notice, though :)

浴红衣 2024-08-18 20:00:42

ArrayList 可能没问题,是的 - 但是您使用什么标准来衡量“最佳”?无论如何,它必须有多好?无论这些标准是什么,您在复杂性和“善良”之间的权衡是什么?

ArrayList<T> would probably be fine, yes - but what criteria are you using for "best" anyway? And just how good does it have to be anyway? What are your trade-offs between complexity and "goodness" in whatever those criteria are?

月下伊人醉 2024-08-18 20:00:42

Javolution 声称拥有 Java 中最快的 List 实现。但我在这个库中找不到任何随机播放实现,因此您必须手动完成。

Javolution claims to have the fastest List implementation in java. But I couldn't find any shuffle implementation in this library so you will have to do it by hand.

你列表最软的妹 2024-08-18 20:00:42

Google 的 Guava 库有一些非常好的原始处理,包括 Ints.asList() 方法返回一个可以打乱顺序的列表。

Guava 项目仍处于部署的初步阶段,尽管代码已经过仔细审查并在 Google 大量使用。您需要从 SVN 检索代码并构建 com .google.common.primitive 类。

Google's Guava library has some really nice primitive handling, including an Ints.asList() method returning a list that may be shuffled.

The Guava project is still at a preliminary stage of deployment, though the code has been carefully reviewed and heavily used at Google. You'll need to retrieve the code from SVN and build the com.google.common.primitive classes.

旧人哭 2024-08-18 20:00:42

这是关于您对有关 FastArrayList 的问题的更新。

FastArrayList 确实实现了 RandomAccess,但 apache 对此类有一个注释,指出“该类不是跨平台的。使用它可能会在某些体系结构上导致意外失败”。

FastArrayList 类(javadoc) 是一个并发列表类。 javadoc 是这样说的:

java.util.ArrayList的定制实现,旨在操作
在多线程环境中,大多数方法
调用是只读的,而不是结构变化。当操作于
“快速”模式,读调用是非同步的,写调用执行
执行以下步骤:

  1. 克隆现有集合
  2. 在克隆上执行修改
  3. 用(修改后的)克隆替换现有集合

[...]

注意:如果您仅在一个数组中创建和访问 ArrayList
单线程,你应该直接使用java.util.ArrayList(没有
同步),以获得最佳性能。

注意:此类不是跨平台的[由于以下问题
快速模式和多线程
]

现在您的用例(如所描述的)是单线程的。因此:

  • “跨平台”问题并不相关,因为它只影响多线程用例。
  • 第一个“注意”(清楚地)表明,对于单线程应用程序,最好使用 ArrayList。

简而言之,FastArrayList 中的“快速”是相对于(比如说)这样做的:

  List<String> myConcurrentlList = Collections.synchronizedList(new ArrayList<>());

回到你原来的问题。 ArrayList 是最简单的快速方法,我怀疑任何其他 List 类都能打败它。但是,以下方法可能更快。

  String[] array = new String[...];
  // populate array
  // shuffle array ... using same algorithm as Collections.shuffle
  for (int i = array.length; i > 1; i--)
      swap(array, i - 1, rnd.nextInt(i));
  }
  List<String> list = Arrays.asList(array);

为什么这样会更快?因为数组上的交换操作比 ArrayList 上的交换操作要快。

整体会更快吗?很难说。这取决于:

  • 像这样创建/填充数组是否是额外的工作
  • asList 包装器上的列表操作的性能是否与 ArrayList 相比......以及您执行的操作等。

我的建议是要小心“过早优化”。

This is about your update to your question concerning FastArrayList.

FastArrayList does implement RandomAccess but apache has a note for this class saying "This class is not cross-platform. Using it may cause unexpected failures on some architectures".

The FastArrayList class (javadoc) is a concurrent list class. This is what the javadoc says:

A customized implementation of java.util.ArrayList designed to operate
in a multithreaded environment where the large majority of method
calls are read-only, instead of structural changes. When operating in
"fast" mode, read calls are non-synchronized and write calls perform
the following steps:

  1. Clone the existing collection
  2. Perform the modification on the clone
  3. Replace the existing collection with the (modified) clone

[...]

NOTE: If you are creating and accessing an ArrayList only within a
single thread, you should use java.util.ArrayList directly (with no
synchronization), for maximum performance.

NOTE: This class is not cross-platform [due to issues with
fast mode and multiple threads
]

Now your use-case (as described) is single-threaded. So:

  • The "cross-platform" issue is not relevant because it only affects multi-threaded use-cases.
  • The first "NOTE" says (clearly) that for a single-threaded application you are better off using an ArrayList.

In short, the "fast" in FastArrayList is relative to (say) doing this:

  List<String> myConcurrentlList = Collections.synchronizedList(new ArrayList<>());

Back to your original question. ArrayList is the simplest of the fast ways, and I doubt that any other List class will beat it. However, the following approach may be faster.

  String[] array = new String[...];
  // populate array
  // shuffle array ... using same algorithm as Collections.shuffle
  for (int i = array.length; i > 1; i--)
      swap(array, i - 1, rnd.nextInt(i));
  }
  List<String> list = Arrays.asList(array);

Why might that be faster? Because swap operations on an array will be faster than on an ArrayList.

Will it be faster overall? Hard to say. It depends on:

  • whether you creating / populating the array like that is / isn't extra work
  • whether the performance of list operations on an asList wrapper compared to an ArrayList ... and what operations you perform, etc.

My advice would be to beware of "premature optimization".

小伙你站住 2024-08-18 20:00:42

有一个名为 GlueList 的新 List 实现,它比 ArrayList 和 LinkedList 非常快。

免责声明:我已经创建了实现。

There is a new List implementation called GlueList which is very fast than ArrayList and LinkedList.

Disclaimer: I've created the implementation.

谁的新欢旧爱 2024-08-18 20:00:42

ArrayList 将是最好的列表。因为数组支持对于交换洗牌中使用的元素非常有效。

但是,如果您确实追求性能,您可能希望考虑使用 int[] 或基于 int[] 的自定义列表,就像所有 List 和 List 标准实现一样,您将把 int 装箱和拆箱为 Integers。

这不会是 suffle 上的问题,因为这只是重新排序指针,但您可能会在不需要时创建 100,000 个对象。假设您在创建之前知道列表的大小,您可以很容易地创建一个包装原始数组的新 List 类。如果用作 java.util.List,您仍然需要对任何 get 方法的返回进行装箱。

ArrayList will be the best List for this. As the array backing will be very efficent for swapping elements used in shuffle.

But if you are really chacing performance you may wish to consider using an int[] or a custom list based in int[] as with all the List and List standard implementations you will be boxing and unboxing ints to Integers.

This will not be an issue on the suffle as this will just be reordering pointers but you will be creating 100,000 objects when you may not need to. Assuming you know the size of your list before creation you can quite easily create a new List class that wraps a primitive array. If used as a java.util.List you will still need to box the return from any get method.

饮湿 2024-08-18 20:00:42

您还可以使用基于内存映射文件的列表实现。在这种实现中,列表并不完全存在于内存中,而是只有大列表的一部分在内存中处于活动状态。
如果达到堆空间限制(主要在 32 位 jvm 中),您可能需要使用内存映射文件使列表无缝推送数据,这比普通文件 I/O 更快。
Google 代码 中介绍了一种此类实现,并在 链接

You can also use memory mapped file based list implementation . In such implementation the list is not fully present in memory but only a section of the huge list will be active in memory.
If you are reaching heap space limitation (mostly in 32 bit jvm) you may be needing to make the list push off data seamlessly using a memory mapped file which will be faster than normal file I/O.
One such implementation is described in this google code and explained in this link.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文