关于正确绘制概率/密度直方图的 R 问题

发布于 2024-11-06 19:07:44 字数 1244 浏览 2 评论 0原文

我有一个以下矩阵 [500,2]，所以我们有 500 行和 2 列，左边的矩阵给出了 X 观测值的索引，右边的给出了该 X 实现的概率，所以 - 一个典型的概率密度关系。

所以，我的问题是，如何以正确的方式绘制直方图，以便 x 轴是 x 索引，y 轴是密度（0.01-1.00）。估计器的带宽为 0.33。

提前致谢！

整个数据的结尾看起来像这样：只是为了一点定位

[490,]  2.338260830 0.04858685
[491,]  2.347839477 0.04797310
[492,]  2.357418125 0.04736149
[493,]  2.366996772 0.04675206
[494,]  2.376575419 0.04614482
[495,]  2.386154067 0.04553980
[496,]  2.395732714 0.04493702
[497,]  2.405311361 0.04433653
[498,]  2.414890008 0.04373835
[499,]  2.424468656 0.04314252
[500,]  2.434047303 0.04254907

@everyone，是的，我之前已经做过估计，所以..带宽就是我提到的，数据是从低值到高值排序的，所以分别开始时的概率是0,22，峰值时约为0,48，在结束0,15。

密度线绘制得像一个魅力，但我还要做的是绘制直方图！那么，我该如何做到这一点，正确排序块（将数据拆分到框中等......）

有什么建议吗？

这是估计后的数据的一部分，所有值都是离散的，所以我假设可以创建直方图......希望如此。

[491,] 4.956164 0.2618131
[492,] 4.963014 0.2608723
[493,] 4.969863 0.2599309
[494,] 4.976712 0.2589889
[495,] 4.983562 0.2580464
[496,] 4.990411 0.2571034
[497,] 4.997260 0.2561599
[498,] 5.004110 0.2552159
[499,] 5.010959 0.2542716
[500,] 5.017808 0.2533268
[501,] 5.024658 0.2523817

此致，感谢快速响应！（鞠躬）

要做的工作是为索引创建一个直方图，例如将它们以 x25/x50 的方式分组......并计算每个 25 或 50/100 的平均概率/150/200/250 等作为盒子..？

原文

I have a following matrix [500,2], so we have 500 rows and 2 columns, the left one gives us the index of X observations, and the right one gives the probability with which this X comes true, so - a typical probability density relationship.

So, my question is, how to plot the histogram the right way, so that the x-axis is the x-index, and the y-axis is the density(0.01-1.00). The bandwidth of the estimator is 0.33.

Thanks in advance!

the end of the whole data looks like this: just for a little orientation

[490,]  2.338260830 0.04858685
[491,]  2.347839477 0.04797310
[492,]  2.357418125 0.04736149
[493,]  2.366996772 0.04675206
[494,]  2.376575419 0.04614482
[495,]  2.386154067 0.04553980
[496,]  2.395732714 0.04493702
[497,]  2.405311361 0.04433653
[498,]  2.414890008 0.04373835
[499,]  2.424468656 0.04314252
[500,]  2.434047303 0.04254907

@everyone,
yes, I have made the estimation before, so.. the bandwith is what I mentioned, the data is ordered from low to high values, so respecively the probability at the beginning is 0,22, at the peak about 0,48, at the end 0,15.

The line with the density is plotted like a charm but I have to do in addition is to plot a histogram! So, how I can do this, ordering the blocks properly(ho the data to be splitted in boxes etc..)

Any suggestions?

Here is a part of the data AFTER the estimation, all values are discrete, so I assume histogram can be created.., hopefully.

[491,] 4.956164 0.2618131
[492,] 4.963014 0.2608723
[493,] 4.969863 0.2599309
[494,] 4.976712 0.2589889
[495,] 4.983562 0.2580464
[496,] 4.990411 0.2571034
[497,] 4.997260 0.2561599
[498,] 5.004110 0.2552159
[499,] 5.010959 0.2542716
[500,] 5.017808 0.2533268
[501,] 5.024658 0.2523817

Best regards,
appreciate the fast responses!(bow)

What will do the job is to create a histogram just for the indexes, grouping them in a way x25/x50 each, for instance...and compute the average probability for each 25 or 50/100/150/200/250 etc as boxes..?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

°如果伤别离去 2024-11-13 19:07:44

假设行按从 x 的最低值到最高值的顺序排列，正如它们看起来的那样，您可以使用默认的绘图命令，您需要的唯一更改是类型：

plot(your.data, type = 'l')

编辑：

好的，我不确定这比密度图，但可以这样做：

x = dnorm(seq(-1, 1, length = 500))
x.bins = rep(1:50, each = 10)
bars = aggregate(x, by = list(x.bins), FUN = sum)[,2]
barplot(bars)

在您的情况下，将 x 替换为矩阵第二列的概率。

EDIT2：

再想一想，只有当您的 500 行代表离散事件时，这才有意义。如果它们是连续分布函数上的点，那么像我所做的那样将它们加在一起是不正确的。从数学上讲，我认为您无法仅使用该范围内的几个点来生成该范围的分箱概率。

Assuming the rows are in order from lowest to highest value of x, as they appear to be, you can use the default plot command, the only change you need is the type:

plot(your.data, type = 'l')

EDIT:

Ok, I'm not sure this is better than the density plot, but it can be done:

x = dnorm(seq(-1, 1, length = 500))
x.bins = rep(1:50, each = 10)
bars = aggregate(x, by = list(x.bins), FUN = sum)[,2]
barplot(bars)

In your case, replace x with the probabilities from the second column of your matrix.

EDIT2:

On second thought, this only makes sense if your 500 rows represent discrete events. If they are instead points along a continuous distribution function adding them together as I have done is incorrect. Mathematically I don't think you can produce the binned probability for a range using only a few points from within that range.

回复收藏 0 原文