组合两个正态随机变量

发布于 2024-10-07 19:03:44 字数 558 浏览 5 评论 0原文

假设我有以下 2 个随机变量:

X,其中平均值 = 6 且标准偏差 = 3.5
Y 其中mean = -42 且stdev = 5

我想根据前两个创建一个新的随机变量Z,并知道:X 发生在 90% 的时间,Y 发生在 10% 的时间。

计算 Z 的平均值很容易: 0.9 * 6 + 0.1 * -42 = 1.2

但是是否可以在单个函数中生成 Z 的随机值? 当然,我可以按照这些思路做一些事情:

if (randIntBetween(1,10) > 1)
    GenerateRandomNormalValue(6, 3.5);
else
    GenerateRandomNormalValue(-42, 5);

但我真的很想有一个函数来充当这样一个不一定是正态的随机变量(Z)的概率密度函数。

抱歉,蹩脚的伪代码

感谢您的帮助!

编辑:这是一个具体的询问:

假设我们将 Z 中 5 个连续值的结果相加。以大于 10 的数字结尾的概率是多少?

suppose I have the following 2 random variables :

X where mean = 6 and stdev = 3.5
Y where mean = -42 and stdev = 5

I would like to create a new random variable Z based on the first two and knowing that : X happens 90% of the time and Y happens 10% of the time.

It is easy to calculate the mean for Z : 0.9 * 6 + 0.1 * -42 = 1.2

But is it possible to generate random values for Z in a single function?
Of course, I could do something along those lines :

if (randIntBetween(1,10) > 1)
    GenerateRandomNormalValue(6, 3.5);
else
    GenerateRandomNormalValue(-42, 5);

But I would really like to have a single function that would act as a probability density function for such a random variable (Z) that is not necessary normal.

sorry for the crappy pseudo-code

Thanks for your help!

Edit : here would be one concrete interrogation :

Let's say we add the result of 5 consecutives values from Z. What would be the probability of ending with a number higher than 10?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

难忘№最初的完美 2024-10-14 19:03:44

但是我真的很想拥有一个
单个函数将充当
这样的概率密度函数
随机变量 (Z) 不是
必要的正常。

好的,如果你想要密度,这里是:

rho = 0.9 * density_of_x + 0.1 * density_of_y

但是如果你不这样做,你就无法从这个密度中采样:1)计算它的CDF(很麻烦,但并非不可行)2)反转它(你将需要一个数值求解器) 。或者您可以进行拒绝采样(或变体,例如重要性采样)。这是成本高昂且难以正确实施的。

因此,您应该使用“if”语句(即调用生成器 3 次),除非您有非常充分的理由不这样做(例如使用准随机序列)。

But I would really like to have a
single function that would act as a
probability density function for such
a random variable (Z) that is not
necessary normal.

Okay, if you want the density, here it is:

rho = 0.9 * density_of_x + 0.1 * density_of_y

But you cannot sample from this density if you don't 1) compute its CDF (cumbersome, but not infeasible) 2) invert it (you will need a numerical solver for this). Or you can do rejection sampling (or variants, eg. importance sampling). This is costly, and cumbersome to get right.

So you should go for the "if" statement (ie. call the generator 3 times), except if you have a very strong reason not to (using quasi-random sequences for instance).

一江春梦 2024-10-14 19:03:44

如果随机变量表示为 x=(mean,stdev),则以下代数适用,

number * x = ( number*mean, number*stdev )

x1 + x2 = ( mean1+mean2, sqrt(stdev1^2+stdev2^2) )

因此对于 X = (mx,sx)、Y= (my,sy) 的情况,线性组合为

Z = w1*X + w2*Y = (w1*mx,w1*sx) + (w2*my,w2*sy) = 
    ( w1*mx+w2*my, sqrt( (w1*sx)^2+(w2*sy)^2 ) ) =
    ( 1.2, 3.19 )

链接: 正态分布查找杂项部分,第 1 项。

PS。抱歉使用了奇怪的符号。新的标准差是通过类似于毕达哥拉斯定理的方法计算的。它是平方和的平方根。

If a random variable is denoted x=(mean,stdev) then the following algebra applies

number * x = ( number*mean, number*stdev )

x1 + x2 = ( mean1+mean2, sqrt(stdev1^2+stdev2^2) )

so for the case of X = (mx,sx), Y= (my,sy) the linear combination is

Z = w1*X + w2*Y = (w1*mx,w1*sx) + (w2*my,w2*sy) = 
    ( w1*mx+w2*my, sqrt( (w1*sx)^2+(w2*sy)^2 ) ) =
    ( 1.2, 3.19 )

link: Normal Distribution look for Miscellaneous section, item 1.

PS. Sorry for the wierd notation. The new standard deviation is calculated by something similar to the pythagorian theorem. It is the square root of the sum of squares.

鲜血染红嫁衣 2024-10-14 19:03:44

这是分布的形式:

ListPlot[BinCounts[Table[If[RandomReal[] < .9,
    RandomReal[NormalDistribution[6, 3.5]], 
    RandomReal[NormalDistribution[-42, 5]]], {1000000}], {-60, 20, .1}], 
    PlotRange -> Full, DataRange -> {-60, 20}]

alt text

它不是正常的,因为您没有添加正常变量,而只是选择以一定的概率选择其中之一。

编辑

这是使用此分布添加五个变量的曲线:

alt text

上部和下部峰值代表单独采用其中一种分布,中间峰值表示混合。

This is the form of the distribution:

ListPlot[BinCounts[Table[If[RandomReal[] < .9,
    RandomReal[NormalDistribution[6, 3.5]], 
    RandomReal[NormalDistribution[-42, 5]]], {1000000}], {-60, 20, .1}], 
    PlotRange -> Full, DataRange -> {-60, 20}]

alt text

It is NOT Normal, as you are not adding Normal variables, but just choosing one or the other with certain probability.

Edit

This is the curve for adding five vars with this distribution:

alt text

The upper and lower peaks represent taking one of the distributions alone, and the middle peak accounts for the mixing.

梦途 2024-10-14 19:03:44

最直接且普遍适用的解决方案是模拟该问题:

运行您拥有的分段函数 1,000,000(只是一个很高的次数)次,生成结果的直方图(通过将它们分成多个箱,然后将每个箱的计数除以您的 N(在我的示例中为 1,000,000)这将为您留下每个给定 bin 的 Z 的 PDF 的近似值

The most straightforward and generically applicable solution is to simulate the problem:

Run the piecewise function you have 1,000,000 (just a high number) of times, generate a histogram of the results (by splitting them into bins, and divide the count for each bin by your N (1,000,000 in my example). This will leave you with an approximation for the PDF of Z at every given bin.

各空 2024-10-14 19:03:44

这里有很多未知数,但本质上您只是希望将两个(或更多)概率函数彼此相加。

对于任何给定的概率函数,您可以通过计算概率曲线下的面积(积分)来计算具有该密度的随机数,然后生成 0 和该面积之间的随机数。然后沿着曲线移动,直到面积等于您的随机数并将其用作您的值。

然后可以将该过程推广到任何函数(或两个或多个函数的总和)。

详细说明
如果你有一个范围从 0 到 1 的分布函数 f(x)。你可以通过计算 f(x) 从 0 到 1 的积分来计算基于分布的随机数,给出曲线下的面积,让称之为 A。

现在,您生成一个介于 0 和 A 之间的随机数,我们将该数字称为 r。现在你需要找到一个值 t,使得 f(x) 从 0 到 t 的积分等于 r。 t 是你的随机数。

该过程可用于任何概率密度函数 f(x)。包括两个(或多个)概率密度函数的总和。

我不确定你的函数是什么样的,所以不确定你是否能够计算所有这些的解析解,但更糟糕的情况是,你可以使用数字技术来近似效果。

Lots of unknowns here, but essentially you just wish to add the two (or more) probability functions to one another.

For any given probability function you could calculate a random number with that density by calculating the area under the probability curve (the integral) and then generating a random number between 0 and that area. Then move along the curve until the area is equal to your random number and use that as your value.

This process can then be generalized to any function (or sum of two or more functions).

Elaboration:
If you have a distribution function f(x) which ranges from 0 to 1. You could calculate a random number based on the distribution by calculating the integral of f(x) from 0 to 1, giving you the area under the curve, lets call it A.

Now, you generate a random number between 0 and A, let's call that number, r. Now you need to find a value t, such that the integral of f(x) from 0 to t is equal to r. t is your random number.

This process can be used for any probability density function f(x). Including the sum of two (or more) probability density functions.

I'm not sure what your functions look like, so not sure if you are able to calculate analytic solutions for all this, but worse case scenario, you could use numeric techniques to approximate the effect.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文