选择幂集的随机元素
对于我现在正在解决的问题,我希望从给定集合的幂集中进行相当均匀的随机选择。不幸的是,这直接涉及到统计数据,而这是我根本没有研究过的东西(现在我正在进入真正的编程,我需要纠正它),所以我想在一些知道它的人面前运行我的解决方案。
如果给定的集合的大小为 n,则存在 (nk) = n!/[k!(nk)!] 大小为 k 的子集,并且幂集的总大小 N 为 k 上的 (nk) 之和,如下所示: 0 到 n。 (也给出为 2n,但我认为这在这里没有用。我could 显然是be 错误的)。
所以我的计划是将 [0, 1] 划分为区间:
[0, (n 0)/N]
((n 0)/N, [(n 0) + (n 1)]/N]
([(n 0) + (n 1)]/N, [(n 0) + (n 1) + (n 2)]/N]
...
([N - (n n)]/N, 1]
从算法上讲,区间是通过取前一个区间的最大元素作为新区间的最大下界并添加 (nj)/N 来构造的最大的元素。我希望这是清楚的。
然后,我可以通过在 [0, 1] 中选择一个统一浮点数并将其映射到它所属区间的索引来计算出随机子集中有多少个元素。从那里,我可以选择适当大小的随机子集。
我非常确定(仅从直观的角度来看)我的方案在子集的大小上提供了统一的选择(相对于子集的总数是统一的。显然在尺寸集 {1, 2, .., n})。
我正在使用一个库(python 的
random.sample
)来获取给定大小的子集,因此我相信这将是统一的。
所以我的问题是,按照我描述的方式将两者放在一起是否可以使随机大小的随机子集的选择变得统一。如果答案是大量工作,那么我很乐意接受有关如何证明这一点的指示并自己完成这项工作。另外,如果有更好的方法来做到这一点,那么我当然会很高兴。
For a problem that I'm working on right now, I would like a reasonably uniform random choice from the powerset of a given set. Unfortunately this runs right into statistics which is something that I've not studied at all (something that I need to correct now that I'm getting into real programming) so I wanted to run my solution past some people that know it.
If the given set has size n, then there are (n k) = n!/[k!(n-k)!] subsets of size k and the total size N of the powerset is given as the sum of (n k) over k from 0 to n. (also given as 2n but I don't think that that's of use here. I could was obviously be wrong).
So my plan is to partition [0, 1] into the intervals:
[0, (n 0)/N]
((n 0)/N, [(n 0) + (n 1)]/N]
([(n 0) + (n 1)]/N, [(n 0) + (n 1) + (n 2)]/N]
...
([N - (n n)]/N, 1]
Algorithmically, the intervals are constructed by taking the greatest element of the previous interval for the greatest lower bound of the new interval adding (n j)/N to it to obtain the greatest element. I hope that's clear.
I can then figure out how many elements are in the random subset by choosing a uniform float in [0, 1] and mapping it to the index of the interval that it belongs to. From there, I can choose a random subset of the appropriate size.
I'm pretty sure (from a merely intuitive perspective) that my scheme provides a uniform choice on the size of the subset (uniform relative to the total amount of subsets. It's plainly not uniform on the set {1, 2, .., n} of sizes).
I'm using a library (python's
random.sample
) to get the subset of the given size so I'm confident that that will be uniform.
So my question is if putting the two together in the way I'm describing makes the choice of random subset of random size uniform. If the answer is a lot of work, then I'm happy to accept pointers as to how this might be proven and do the work for myself. Also, if there's a better way to do this, then I would of course be happy with that.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
我想你会从长远来看这个问题。当您提到幂集的大小为 2n 时,您已经很接近了。如果要选择大小为 n 的集合的幂集的随机元素,请生成 [0, 2n) 范围内的随机整数并使用二进制用于从幂集中选择适当元素的整数表示。
例如,假设 S = {a, b, c, d, e}。幂集包含 25 = 32 个元素。生成一个从 0 到 31 的随机数,例如 18。18 的二进制表示为 10010,因此您将选择 S 的第一个和第四个元素。幂集的随机元素为 {a, d}。
I think you're going about this the long way. You were close when you mentioned the size of the power set as 2n. If you want to select a random element of the power set of a set of size
n
, generate a random integer in the range [0, 2n) and use the binary representation of the integer to select the appropriate element from the power set.For example, suppose S = {a, b, c, d, e}. The power set then contains 25 = 32 elements. Generate a random number from 0 to 31, for example 18. The binary representation of 18 is 10010, so you would select the first and fourth elements of S. Your random element of the power set is then {a, d}.
依次考虑给定集合的每个元素,并以 1/2 的概率决定将其包含在结果集中。
Consider each element of the given set in turn, and decide with probability 1/2 to include it in the result set.