有效地从链式哈希表中选取随机元素?
只是为了练习(而不是作为家庭作业)我一直在尝试解决这个问题(CLRS,第 3 版,练习 11.2-6):
假设我们在大小为 m 的哈希表中存储了 n 个键,其中 通过链接解决冲突,并且我们知道每个的长度 链,包括最长链的长度L。描述一个 从密钥中均匀随机选择一个密钥的过程 放入哈希表中,并在预计时间 O(L * (1 + m/n)) 内返回。
到目前为止我的想法是每个键被返回的概率是 1/n。如果我们尝试获取 1 到 n 之间的随机值 x,并尝试首先按存储桶排序,然后沿着存储桶中的链查找序列中的第 x 个键,则需要 O(m) 才能找到正确的存储桶:一个接一个地遍历桶并在 O(L) 时间内获得链中正确的密钥。
Just for practice (and not as a homework assignment) I have been trying to solve this problem (CLRS, 3rd edition, exercise 11.2-6):
Suppose we have stored n keys in a hash table of size m, with
collisions resolved by chaining, and that we know the length of each
chain, including the length L of the longest chain. Describe a
procedure that selects a key uniformly at random from among the keys
in the hash table and returns it in expected time O(L * (1 + m/n)).
What I thought so far is that the probability of each key being returned is 1/n. If we try to get a random value x between 1 to n, and try to find the xth key in sequence first sorted by bucket then along the chain in the bucket, then it will take O(m) to find the right bucket by going through buckets one by one and O(L) time to get the right key in chain.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
重复以下步骤,直到返回一个元素:
k
为桶中元素的数量。1 ... L
中均匀随机选择p
。如果p <= k
,则返回存储桶中的第p
元素。应该清楚的是,该过程均匀地随机返回一个元素。我们正在对二维数组中的元素应用拒绝采样。
每个存储桶的预期元素数量为
n / m
。采样尝试成功的概率为(n / m) / L
。因此,找到存储桶所需的预期尝试次数为 L * m / n。加上从存储桶中检索元素的O(L)
成本,预期运行时间为O(L * (1 + m / n))
。Repeat the following steps until an element is returned:
k
be the number of elements in the bucket.p
uniformly at random from1 ... L
. Ifp <= k
then return thep
th element in the bucket.It should be clear that the procedure returns an element uniformly at random. We are sort of applying rejection sampling to the elements placed in a 2D array.
The expected number of elements per bucket is
n / m
. The probability that the sampling attempt succeeds is(n / m) / L
. The expected number of attempts needed to find a bucket is thereforeL * m / n
. Together with theO(L)
cost of retrieving the element from the bucket, the expected running time isO(L * (1 + m / n))
.