从 R 中的直方图获取频率值

发布于 2024-12-09 17:29:11 字数 178 浏览 1 评论 0原文

我知道如何绘制直方图或其他频率/百分比相关的表格。 但现在我想知道,如何在表中获取这些频率值以供事后使用。

我有一个庞大的数据集,现在我绘制一个具有设置的 binwidth 的直方图。我想提取与每个 binwidth 相对应的频率值(即 y 轴上的值)并将其保存在某处。

有人可以帮我解决这个问题吗? 谢谢你!

I know how to draw histograms or other frequency/percentage related tables.
But now I want to know, how can I get those frequency values in a table to use after the fact.

I have a massive dataset, now I draw a histogram with a set binwidth. I want to extract the frequency value (i.e. value on y-axis) that corresponds to each binwidth and save it somewhere.

Can someone please help me with this?
Thank you!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

记忆里有你的影子 2024-12-16 17:29:11

hist 函数有一个返回值(histogram 类的对象):

R> res <- hist(rnorm(100))
R> res
$breaks
[1] -4 -3 -2 -1  0  1  2  3  4

$counts
[1]  1  2 17 27 34 16  2  1

$intensities
[1] 0.01 0.02 0.17 0.27 0.34 0.16 0.02 0.01

$density
[1] 0.01 0.02 0.17 0.27 0.34 0.16 0.02 0.01

$mids
[1] -3.5 -2.5 -1.5 -0.5  0.5  1.5  2.5  3.5

$xname
[1] "rnorm(100)"

$equidist
[1] TRUE

attr(,"class")
[1] "histogram"

The hist function has a return value (an object of class histogram):

R> res <- hist(rnorm(100))
R> res
$breaks
[1] -4 -3 -2 -1  0  1  2  3  4

$counts
[1]  1  2 17 27 34 16  2  1

$intensities
[1] 0.01 0.02 0.17 0.27 0.34 0.16 0.02 0.01

$density
[1] 0.01 0.02 0.17 0.27 0.34 0.16 0.02 0.01

$mids
[1] -3.5 -2.5 -1.5 -0.5  0.5  1.5  2.5  3.5

$xname
[1] "rnorm(100)"

$equidist
[1] TRUE

attr(,"class")
[1] "histogram"
沧桑㈠ 2024-12-16 17:29:11

来自 ?hist
评估

“直方图”类的对象,它是一个包含组件的列表:

  • 打破 n+1 个单元格边界(= 如果是向量则打破)。
    这些是名义上的中断,不带有边界模糊。
  • 计算 n 个整数;对于每个单元格,内部 x[] 的数量。
  • 密度值 f^(x[i]),作为估计的密度值。如果
    all(diff(breaks) == 1),它们是相对频率 counts/n
    并且一般满足 sum[i; f^(x[i]) (b[i+1]-b[i])] = 1,其中 b[i]
    = 中断[i]。
  • 强度与密度相同。已弃用,但保留
    兼容性。
  • n 个单元格的中点。
  • xname 具有实际 x 参数名称的字符串。
  • 等距逻辑,表示断点之间的距离是否全部
    相同。

breaks密度 几乎提供了您所需的一切:

histrv<-hist(x)
histrv$breaks
histrv$density

From ?hist:
Value

an object of class "histogram" which is a list with components:

  • breaks the n+1 cell boundaries (= breaks if that was a vector).
    These are the nominal breaks, not with the boundary fuzz.
  • counts n integers; for each cell, the number of x[] inside.
  • density values f^(x[i]), as estimated density values. If
    all(diff(breaks) == 1), they are the relative frequencies counts/n
    and in general satisfy sum[i; f^(x[i]) (b[i+1]-b[i])] = 1, where b[i]
    = breaks[i].
  • intensities same as density. Deprecated, but retained for
    compatibility.
  • mids the n cell midpoints.
  • xname a character string with the actual x argument name.
  • equidist logical, indicating if the distances between breaks are all
    the same.

breaks and density provide just about all you need:

histrv<-hist(x)
histrv$breaks
histrv$density
对你而言 2024-12-16 17:29:11

以防万一有人在考虑到 ggplot 的 geom_histogram 时遇到这个问题,请注意,有一种方法可以从 ggplot 对象中提取数据。

以下便利函数输出一个数据帧,其中包含每个 bin 的下限 (xmin)、每个 bin 的上限 (xmax)、每个 bin 的中点 ( x),以及频率值(y)。

## Convenience function
get_hist <- function(p) {
    d <- ggplot_build(p)$data[[1]]
    data.frame(x = d$x, xmin = d$xmin, xmax = d$xmax, y = d$y)
}

# make a dataframe for ggplot
set.seed(1)
x = runif(100, 0, 10)
y = cumsum(x)
df <- data.frame(x = sort(x), y = y)

# make geom_histogram 
p <- ggplot(data = df, aes(x = x)) + 
    geom_histogram(aes(y = cumsum(..count..)), binwidth = 1, boundary = 0,
                color = "black", fill = "white")

插图:

hist = get_hist(p)
head(hist$x)
## [1] 0.5 1.5 2.5 3.5 4.5 5.5
head(hist$y)
## [1]  7 13 24 38 52 57
head(hist$xmax)
## [1] 1 2 3 4 5 6
head(hist$xmin)
## [1] 0 1 2 3 4 5

我在这里回答的一个相关问题(Cumulative histogram with ggplot2)。

Just in case someone hits this question with ggplot's geom_histogram in mind, note that there is a way to extract the data from a ggplot object.

The following convenience function outputs a dataframe with the lower limit of each bin (xmin), the upper limit of each bin (xmax), the mid-point of each bin (x), as well as the frequency value (y).

## Convenience function
get_hist <- function(p) {
    d <- ggplot_build(p)$data[[1]]
    data.frame(x = d$x, xmin = d$xmin, xmax = d$xmax, y = d$y)
}

# make a dataframe for ggplot
set.seed(1)
x = runif(100, 0, 10)
y = cumsum(x)
df <- data.frame(x = sort(x), y = y)

# make geom_histogram 
p <- ggplot(data = df, aes(x = x)) + 
    geom_histogram(aes(y = cumsum(..count..)), binwidth = 1, boundary = 0,
                color = "black", fill = "white")

Illustration:

hist = get_hist(p)
head(hist$x)
## [1] 0.5 1.5 2.5 3.5 4.5 5.5
head(hist$y)
## [1]  7 13 24 38 52 57
head(hist$xmax)
## [1] 1 2 3 4 5 6
head(hist$xmin)
## [1] 0 1 2 3 4 5

A related question I answered here (Cumulative histogram with ggplot2).

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文