一次遍历获取单向链表中的随机元素
我有一个单向链表,但不知道它的大小。
我想在这个列表中获取一个随机元素,并且我只有一次机会遍历该列表。 (不允许遍历两次及以上)
这道题的算法是什么?谢谢!
I have a single direction linked list without knowing its size.
I want to get a random element in this list, and I just have one time chance to traverse the list. (I am not allowed to traverse twice or more)
What’s the algorithm for this problem? Thanks!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
这只是对大小为 1 的水库进行的水库采样。
本质上它非常简单,
这是均匀采样的,因为在一天结束时选择任何元素的概率是 1/n(供读者练习)。
This is just reservoir sampling with a reservoir of size 1.
Essentially it is really simple
This is uniformly sampled, since the probability of picking any element at the end of the day is 1/n (exercise to the reader).
这可能是一个面试问题。数据科学家使用油藏采样将大量数据中的相关数据存储在有限的存储空间中。
如果您必须从任何具有元素 n 的数组中收集 k 个元素,以便收集到的每个元素的概率应该相同 (k/n),请执行以下两个步骤:
1) 将前 k 个元素存储在存储中。
2)当下一个元素(k+1)来自流时,显然你的集合中已经没有空间了。生成一个从o到n的随机数,如果生成的随机数小于k假设l,则替换storage[l ] 与流中的 (k+1) 个元素。
现在,回到你的问题,这里存储大小是 1。所以你将选择第一个节点,迭代列表中的第二个元素。现在生成随机数,如果它是 1,则保留样本,否则将存储元素从列表
This is probably an interview question.Reservoir sampling is used by data scientist to store relevant data in limited storage from large stream of data.
If you have to collect k elements from any array with elements n, such that you probability of each element collected should be same (k/n), you follow two steps,
1) Store first k elements in the storage.
2) When the next element(k+1) comes from the stream obviously you have no space in your collection anymore.Generate a random number from o to n, if the generated random number is less than k suppose l, replace storage[l] with the (k+1) element from stream.
Now, coming back to your question, here storage size is 1.So you will pick the first node,iterate over the list for second element.Now generate the random number ,if its 1, leave the sample alone otherwise switch the storage element from list
这个问题可以通过水库采样来完成。它基于从 n 个项目中选择 k 个随机项目,但这里 n 可以非常大(不必适合内存!)并且(如您的情况)最初未知。
维基百科有一个可以理解的算法,我在下面引用:
这个问题只需要 1 个值,所以我们取 k=1。
C 实现:
https://ideone.com/txnsas
This question can be done using reservoir sampling. It is based on choosing k random items out of n items, but here n can be very large(which doesn't has to fit in memory!) and (as in your case) unknown initially.
The wikipedia has an understandable algorithm which i quote below:
The question requires only 1 value so we take k=1.
C implementation :
https://ideone.com/txnsas
这是我发现的最简单的方法,它工作正常并且易于理解:
This is the easiest way that I have found, it works fine and is understandable: