如何从联合、离散、概率分布函数中进行数值采样
我有一个 2D“热图”或 PDF,需要通过随机采样重新创建。 IE 我有一个显示起始位置的二维概率密度图。我需要以与原始 PDF 相同的概率随机选择起始位置。
为此,我认为我需要首先找到联合 CDF(累积密度函数),然后选择随机均匀数对 CDF 进行采样。这就是我被困住的地方。
如何以数字方式找到 PDF 的联合 CDF?我尝试沿两个维度进行累积和,但这没有产生正确的结果。我的统计学知识让我失望了。
编辑 热图/PDF 的形式为 [x,y,z],其中 Z 是每个 x,y 点的强度或概率。
I have a 2D "heat map" or PDF that I need to recreate by random sampling. I.E. I have a 2D probability density map showing starting locations. I need to randomly choose starting locations with the same probability as the original PDF.
To do this, I think I need to first find the joint CDF (cumulative density function), then choose random uniform numbers to sample the CDF. That's where I get stuck.
How do I numerically find the joint CDF of my PDF? I tried doing a cumulative sum along both dimensions, but that didn't yield the correct result. My knowledge of statistics is failing me.
EDIT The heatmap/PDF is the form of [x,y,z], where Z is the intensity or probability at each x,y point.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
您可以首先查看 2D 密度图,并针对其中的每个 (x,y) 对,通过从 PDF 中查找来找到 z。这将为您提供一个概率为 z 的起点 (x,y)。因此每个起点都有自己的 PDF 概率。您现在可以做的是对起点进行排序,随机选择一个数字并将其映射到某个起点。
例如,假设您有 n 个起点:P1 .. Pn。概率为 p1 .. pn(标准化或加权概率,因此总和为 100%)。假设您选择一个随机值 p,如果 p < 则选择 P1。 p1,如果 p1 < 则选择 P2 p< p1+p2,如果p1+p2 <,则选择P3 p< p1+p2+p3 等。您可以将其视为点 P1 到 PN 上的直方图,这与累积分布函数相同。
You could first go over the 2D density map and for each (x,y) pair in it, find z by a lookup from the PDF. This will give you a starting point (x,y) with a probability of z. So each of the starting points have their own probability from the PDF. What you can do now, is to order the starting points, randomly pick a number and map it to some starting point.
For example, lets say you have n starting points: P1 .. Pn. With a probability of p1 .. pn (normalized or weighted probabilities, so the sum is 100%). Lets say you pick a random value p, pick P1 if p < p1, pick P2 if p1 < p < p1+p2, pick P3 if p1+p2 < p < p1+p2+p3 etc. You can look at it as a histogram over the points P1 to PN, which is the same thing as a cumulative distribution function.
吉布斯采样应该给你你想要的
http://en.wikipedia.org/wiki/Gibbs_sampling
Gibbs Sampling should give you what you want
http://en.wikipedia.org/wiki/Gibbs_sampling
好吧,正如这个答案中所观察到的,对于我的在这种情况下,我的分布是二元分布并不一定重要。由于我可以对整个事物进行归一化,使其成为真正的 pdf(总表面积积分为 1),因此我可以将 MxN 矩阵重新排列为 1xM*N 向量。一旦有了这个,我就可以进行累积积分(MATLAB 中的 cumtrapz),然后从中进行采样(使用统一随机数来查找相应的索引值)。
Well, as observed in this answer, for my case it doesn't necessarily matter that my distribution is bivariate. Since I can normalize the whole thing so that it's a true pdf (total surface integrates to 1), I can then rearrange the MxN matrix into a 1xM*N vector. Once I have that, I can do a cumulative integral (cumtrapz in MATLAB), and then sample from that (use a uniform random number to find the corresponding index value).
这也是我想做的事情!!
我有一个用于自变量 X 和 Y 的联合密度函数。现在我想要从此分布中采样新的 x,y。
我相信我必须做的是找到联合累积分布,然后以某种方式从中采样。这正是你所做的。
当您说“使用统一随机数来查找相应的索引值”时,您是否可以更具体一些?
仅供参考:X 是股票市场中卖单的大小,Y 是买单的大小。
This is what I want to do as well!!
I have a joint density function for to independent variables X and Y. And I now want to sample new x,y from this distribution.
What I believe I have to do is to find the joint cumulative distribution and then somehow sample from it. Which is exactly what you seemed to have done.
Could you perhaps be more specific when you say you use "uniform random numbers to find the corresponding index values"?
Just for reference: X is size of ask orders and Y is size of bid orders in the stock market.