从 List中取出 n 个随机元素?
如何从 ArrayListtake()
方法来获取另一个 x 元素,而无需替换。
How can I take n random elements from an ArrayList<E>
? Ideally, I'd like to be able to make successive calls to the take()
method to get another x elements, without replacement.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(13)
主要有两种方式。
使用
Random#nextInt(int)
:但是,不能保证连续的
n
调用返回唯一的元素。使用
Collections#shuffle()
:它使您能够通过递增索引获取
n
个唯一元素(假设列表本身包含唯一元素)。如果您想知道是否有 Java 8 Stream 方法;不,没有内置的。标准 API 中还没有 Comparator#randomOrder() 之类的东西(还没有?)。您可以尝试类似下面的方法,同时仍然满足严格的
Comparator
契约(尽管分布非常糟糕):最好使用
Collections#shuffle()
代替。Two main ways.
Use
Random#nextInt(int)
:It's however not guaranteed that successive
n
calls returns unique elements.Use
Collections#shuffle()
:It enables you to get
n
unique elements by an incremented index (assuming that the list itself contains unique elements).In case you're wondering if there's a Java 8 Stream approach; no, there isn't a built-in one. There's no such thing as
Comparator#randomOrder()
in standard API (yet?). You could try something like below while still satisfying the strictComparator
contract (although the distribution is pretty terrible):Better use
Collections#shuffle()
instead.到目前为止,大多数提出的解决方案都建议通过检查唯一性并在需要时重试来进行完整列表洗牌或连续随机挑选。
但是,我们可以利用 Durstenfeld 算法(当今最流行的 Fisher-Yates 算法)。
由于上述原因,我们不需要打乱整个列表,而是运行循环的步骤与需要返回的元素数量一样多。如果我们使用完美的随机函数,该算法可确保列表末尾的最后 N 个元素是 100% 随机的。
在许多现实场景中,我们需要从数组/列表中选择预定(最大)数量的随机元素,这种优化方法对于各种纸牌游戏非常有用,例如德州扑克,您先验地知道数字每场比赛使用的卡牌数量;通常,牌组中只需要有限数量的牌。
Most of the proposed solutions till now suggest either a full list shuffle or successive random picking by checking uniqueness and retry if required.
But, we can take advantage of the Durstenfeld's algorithm (the most popular Fisher-Yates variant in our days).
Due to the above, we don't need to shuffle the whole list, but run the loop for as many steps as the number of elements required to return. The algorithm ensures that the last N elements at the end of the list are 100% random if we used a perfect random function.
Among the many real-world scenarios where we need to pick a predetermined (max) amount of random elements from arrays/lists, this optimized method is very useful for various card games, such as Texas Poker, where you a-priori know the number of cards to be used per game; only a limited number of cards is usually required from the deck.
如果您想从列表中连续选取 n 个元素,并且能够在不需要一遍又一遍地替换的情况下执行此操作,那么最好的方法可能是随机排列元素,然后将 n 块中的块取出。如果您随机排列列表,则可以保证您挑选的每个块的统计随机性。也许最简单的方法是使用
Collections.shuffle
。If you want to successively pick n elements from the list and be able to do so without replacement over and over and over again, you are probably best of randomly permuting the elements, then taking chunks off in blocks of n. If you randomly permute the list you guarantee statistical randomness for each block you pick out. Perhaps the easiest way to do this would be to use
Collections.shuffle
.一个公平的方法是遍历列表,在第 n 次迭代时计算是否选择第 n 个元素的概率,这本质上是您仍然需要选择的项目数除以元素数的分数在列表的其余部分中可用。例如:(
这段代码是从我不久前写的 从一个列表。)
A fair way to do this is to go through the list, on the nth iteration calculating the probability of whether or not to pick the nth element, which is essentially the fraction of the number of items you still need to pick over the number of elements available in the rest of the list. For example:
(This code copied from a page I wrote a while ago on picking a random sample from a list.)
简单明了
Simple and clear
正如其他答案中所述,当源列表很大时,由于复制,Collections.shuffle 的效率不是很高。这里有一个 Java 8 的单行代码:
代码:
但是,对于无法快速随机访问的列表(例如 LinkedList),复杂度将为
n*O(list_size)
。As noted in other answers,
Collections.shuffle
is not very efficient when the source list is big, because of the copying. Here is a Java 8 one-liner that:Code:
However, for a list with no quick random access (like LinkedList) the complexity would be
n*O(list_size)
.使用以下类:
Use the following class:
继续选择一个随机元素,并确保不会再次选择相同的元素:
理论上,这可能会无限运行,但实际上没有问题。你越接近整个原始列表,运行时间显然就越慢,但这不是选择随机子列表的重点,不是吗?
Keep selecting a random element and make sure you do not choose the same element again:
In theory this could run endlessly but in practise it is fine. The closer you get the whole original list the slower the runtime of this gets obviously but that is not the point of selecting a random sublist, is it?
下面的类从任何类型的列表中检索 N 个项目。如果您提供种子,那么每次运行时它将返回相同的列表,否则,新列表的项目将在每次运行时更改。您可以通过运行主要方法来检查其行为。
The following class retrieve N items from a list of any type. If you provide a seed then on each run it will return the same list, otherwise, the items of the new list will change on each run. You can check its behaviour my running the main methods.
此解决方案不会修改原始列表,也不会随着列表大小而增加复杂性。
要从 7 个列表中获取 4 个样本,我们只需从所有 7 个元素中随机选择一个元素,然后从剩余 6 个元素中随机选择一个元素,依此类推。如果我们已经选择了索引 4, 0, 3,接下来我们从 0, 1, 2, 3 中生成一个随机数,分别代表索引 1, 2, 5, 6。
This solution doesn't modify the original list or otherwise scale in complexity with the list size.
To get a sample of 4 from a list of 7, we just select a random element out of all 7, then select a random element out of the remaining 6, and so on. If we've already selected indices 4, 0, 3, next we generate a random number out of 0, 1, 2, 3, respectively representing index 1, 2, 5, 6.
所有这些答案都需要可修改的列表,否则会遇到性能问题。
这是一个快速片段,需要 O(k) 额外空间,并保证在 O(k) 时间内运行,并且不需要可修改数组。 (在地图中进行随机播放)
All of these answers require a modifiable list or run into performance issued
Here's a swift snippet that required O(k) additional space and is guaranteed to run in O(k) time and doesn't need a modifiable array. (Performs shuffles in a map)
我对此进行了基准测试,重点关注给出 k 个独特结果的方法。我比较了本页中的 3 种方法:
我没有添加随机播放方法,因为我猜测它显然会比方法 1 或 2 慢。
的列表中获取 3 个项目的结果
从 10: 1:150 ns
2:145 纳秒
3:260 ns
从 1000 个列表中获取 5 个项目的结果:
1:1362 ns
2:209 纳秒
3: 312 ns
看来,对于较大的原始列表,普通方法并没有真正表现出明显的性能下降,而交换方法却如此。显然,随着列表的增加,交换会变得昂贵。
因此,最佳性能只是简单地选择每个随机项目,如果已经选择了该项目,则再次进行绘制。
I've benchmarked this, focusing on the methods which give k unique results. I compared 3 methods from this page:
I didn't add the shuffle approach, because I guessed it would clearly be slower than method 1 or 2.
Results with getting 3 items out of a list of 10:
1: 150 ns
2: 145 ns
3: 260 ns
Results with getting 5 items out of a list of 1000:
1: 1362 ns
2: 209 ns
3: 312 ns
It appears that the plain method doesn't really show significantly slower performances with bigger original lists, whereas the swap method does. Apparently the swapping gets expensive with bigger lists.
So best performance is just plain selecting each random item, and doing the draw again if the item was already selected.
以下方法返回从参数 List 列表中获取的新 Min(n, list.size()) 随机元素列表。请记住,每次调用后都会修改列表列表。因此,每次调用都将“消耗”原始列表,从中返回 n 个随机元素:
示例用法:
示例输出:
The following method returns a new List of Min(n, list.size()) random elements taken from the paramenter List list. Have in mind that the List list is being modified after each call. Therefore, each call will be "consuming" the original list returning n random elements from it:
Sample usage:
Sample output: