混合模型采样的复杂性

发布于 2024-09-27 15:24:39 字数 354 浏览 12 评论 0原文

我有一个模型，其中 M 个状态中的状态 j 以概率 p_j 选择。概率可以是任何实数。这指定了 M 状态的混合模型。我可以在恒定时间内访问所有 j 的 p_j 。我想制作大量（N）个随机样本。最明显的算法是

1) 计算累积概率分布 P_j = p_1+p_2+...p_j。 O(M)

2) 对于每个样本，选择 [0,1] 中的随机浮点数 x。 O(N)

3) 对于每个样本，选择 j 使得 min(0,P_j-1) < x <= max(1,P_j)。 O(Nlog(M))

因此渐近复杂度为 O(Nlog(M))。 N 的因子显然是不可避免的，但我想知道 log(M) 。在实际实施中是否有可能克服这个因素？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

扮仙女 2024-10-04 15:24:39

我认为您可以使用以下算法或任何其他合理的多项分布采样器做得更好，

// Normalize p_j
for j = 1 to M
   p_hat[j] = p[j] / P_j

// Place the draws from the mixture model in this array
draws = [];

// Sample until we have N iid samples 
cdf = 1.0;   
for ( j = 1, remaining = N; j <= M && remaining > 0; j++ )
{
   // p_hat[j] is the probability of sampling item j and there
   // are (N - count) items remaining to sample.  This is just
   // (N - count) Bernoulli trials, so draw from a 
   // Binomial(N - count, p_hat[j] / cdf) distribution to get the
   // number of items       
   //
   // Adjusting the probability by 1 - CDF ensures that *something*
   // is sampled because p_hat[M] / cdf = p_hat[M] / p_hat[M] = 1.0
   items = Binomial.sample( remaining, p_hat[j] / cdf );
   remaining -= items;
   cdf -= p_hat[j];

   for ( k = 0; k < items; k++ )
      draws.push( sample_from_mixture_component( j ))         
}

这应该花费接近 O(N) 的时间，但它确实取决于二项分布和混合模型组件采样器的效率。

I think you can do better using something like the following algorithm, or any other reasonable Multinomial distribution sampler,

// Normalize p_j
for j = 1 to M
   p_hat[j] = p[j] / P_j

// Place the draws from the mixture model in this array
draws = [];

// Sample until we have N iid samples 
cdf = 1.0;   
for ( j = 1, remaining = N; j <= M && remaining > 0; j++ )
{
   // p_hat[j] is the probability of sampling item j and there
   // are (N - count) items remaining to sample.  This is just
   // (N - count) Bernoulli trials, so draw from a 
   // Binomial(N - count, p_hat[j] / cdf) distribution to get the
   // number of items       
   //
   // Adjusting the probability by 1 - CDF ensures that *something*
   // is sampled because p_hat[M] / cdf = p_hat[M] / p_hat[M] = 1.0
   items = Binomial.sample( remaining, p_hat[j] / cdf );
   remaining -= items;
   cdf -= p_hat[j];

   for ( k = 0; k < items; k++ )
      draws.push( sample_from_mixture_component( j ))         
}

This should take close to O(N) time but it does depend on how efficient your Binomial distribution and mixture model component samplers are.

回复收藏 0 原文

~没有更多了~