如何从概率不均匀的列表中选择一个值?
我正在查看 k-means++ 初始化算法。该算法的以下两个步骤会产生非均匀概率:
对于每个数据点 x,计算 D(x),即 x 与数据点之间的距离 已选择的最近的中心。
使用加权随机选择一个新数据点作为新中心 概率分布,其中以概率选择点 x 与 D(x)^2 成正比。
我如何在 C++ 中使用这种规定的加权概率分布进行选择?
I am looking at the k-means++ initialization algorithm. The following two steps of the algorithm give rise to non-uniform probabilities:
For each data point x, compute D(x), the distance between x and the
nearest center that has already been chosen.Choose one new data point at random as a new center, using a weighted
probability distribution where a point x is chosen with probability
proportional to D(x)^2.
How can I select with this stated weighted probability distribution in C++?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
使用 随机 标头并使用 std::discrete_distribution。这是示例:
这是输出的示例:
Discrete distributions is a lot easier to do in C++11 with the random header and using std::discrete_distribution. This is example:
and this is a sample of the output:
对于一组有限的单独数据点 X,这需要离散概率分布。
最简单的方法是按顺序枚举点 X,并计算代表其累积概率分布函数的数组:(伪代码如下)
您调用prepare_cdf一次,然后根据需要多次调用select_point来生成随机点。
With a finite set of individual data points X, this calls for a discrete probability distribution.
The easiest way to do this is to enumerate the points X in order, and calculate an array representing their cumulative probability distribution function: (pseudocode follows)
You call prepare_cdf once, and then call select_point as many times as you need to generate random points.
我将采用以下方法:
double distance_squareds[]
或std::vector中。 distance_squareds
或其他什么,并将它们的 D 平方和存储在double sum_distance_squareds
中。drand48
函数 在 [0.0, 1.0) 中选择一个随机数,并乘以sum_distance_squareds
;将结果存储在random_number
中。distance_squareds
,再次将这些值相加,一旦运行总计达到或超过random_number
,就返回与该 D 平方相对应的数据点你刚刚添加了。I'd take the following approach:
double distance_squareds[]
orstd::vector<double> distance_squareds
or whatnot, and storing the sum of their D-squared's in adouble sum_distance_squareds
.drand48
function to choose a random number in [0.0, 1.0), and multiply it bysum_distance_squareds
; store the result inrandom_number
.distance_squareds
, adding together the values (again), and as soon as the running total meets or exceedsrandom_number
, return the data-point corresponding to the D-squared that you'd just added.这里有一些可以帮助你的东西,
使用具有给定概率分布(prob..)的(numbers..)数组,它将为您生成具有这些概率的(数字)(在这里它将对它们进行计数)。
Here you have something that may help you,
using (numbers..) array with given probability distribution (prob..) it will generate for you (numbers) with those probabilities (here it will count them).