我有一个模型,其中 M 个状态中的状态 j 以概率 p_j 选择。概率可以是任何实数。这指定了 M 状态的混合模型。我可以在恒定时间内访问所有 j 的 p_j 。
我想制作大量(N)个随机样本。最明显的算法是
1) 计算累积概率分布 P_j = p_1+p_2+...p_j。 O(M)
2) 对于每个样本,选择 [0,1] 中的随机浮点数 x。 O(N)
3) 对于每个样本,选择 j 使得 min(0,P_j-1) < x <= max(1,P_j)。 O(Nlog(M))
因此渐近复杂度为 O(Nlog(M))。 N 的因子显然是不可避免的,但我想知道 log(M) 。在实际实施中是否有可能克服这个因素?
I have a model where state j among M states is chosen with probability p_j. The probability could be any real number. This specifies a mixture model over the M states. I can access p_j for all j in constant time.
I want to make a large number (N) of random samples. The most obvious algorithm is
1) Compute the cumulative probability distribution P_j = p_1+p_2+...p_j. O(M)
2) For each sample choose random float x in [0,1]. O(N)
3) For each sample choose j such that min(0,P_j-1) < x <= max(1,P_j). O(Nlog(M))
So the asymptotic complexity is O(Nlog(M)). The factor of N is obviously unavoidable, but I am wondering about log(M). Is it possible to beat this factor in a realistic implementation?
发布评论
评论(1)
我认为您可以使用以下算法或任何其他合理的多项分布采样器做得更好,
这应该花费接近 O(N) 的时间,但它确实取决于二项分布和混合模型组件采样器的效率。
I think you can do better using something like the following algorithm, or any other reasonable Multinomial distribution sampler,
This should take close to O(N) time but it does depend on how efficient your Binomial distribution and mixture model component samplers are.