当前位置：文江博客话题详情

r graph ggplot2

在R中的ggplot2中一起使用stat_function和facet_wrap

发布于 2024-08-04 00:40:51 字数 1784 浏览 1 评论 0 原文

我正在尝试使用 ggplot2 绘制晶格类型数据，然后在样本数据上叠加正态分布，以说明基础数据与正态分布的偏离程度。我希望顶部的正常分布具有与面板相同的均值和标准差。

这是一个示例：

library(ggplot2)

#make some example data
dd<-data.frame(matrix(rnorm(144, mean=2, sd=2),72,2),c(rep("A",24),rep("B",24),rep("C",24)))
colnames(dd) <- c("x_value", "Predicted_value",  "State_CD")

#This works
pg <- ggplot(dd) + geom_density(aes(x=Predicted_value)) +  facet_wrap(~State_CD)
print(pg)

一切都运行良好，并生成了一个漂亮的三面板数据图。如何在顶部添加正常距离？看来我会使用 stat_function，但这失败了：

#this fails
pg <- ggplot(dd) + geom_density(aes(x=Predicted_value)) + stat_function(fun=dnorm) +  facet_wrap(~State_CD)
print(pg)

stat_function 似乎与facet_wrap 功能不兼容。怎样才能让这两个玩得很好呢？

------------编辑---------

我尝试整合下面两个答案的想法，但我仍然不在那里：

使用组合这两个答案我可以一起破解：

library(ggplot)
library(plyr)

#make some example data
dd<-data.frame(matrix(rnorm(108, mean=2, sd=2),36,2),c(rep("A",24),rep("B",24),rep("C",24)))
colnames(dd) <- c("x_value", "Predicted_value",  "State_CD")

DevMeanSt <- ddply(dd, c("State_CD"), function(df)mean(df$Predicted_value)) 
colnames(DevMeanSt) <- c("State_CD", "mean")
DevSdSt <- ddply(dd, c("State_CD"), function(df)sd(df$Predicted_value) )
colnames(DevSdSt) <- c("State_CD", "sd")
DevStatsSt <- merge(DevMeanSt, DevSdSt)

pg <- ggplot(dd, aes(x=Predicted_value))
pg <- pg + geom_density()
pg <- pg + stat_function(fun=dnorm, colour='red', args=list(mean=DevStatsSt$mean, sd=DevStatsSt$sd))
pg <- pg + facet_wrap(~State_CD)
print(pg)

这非常接近......除了正常的 dist 绘图有问题：

我在这里做错了什么？

原文

I am trying to plot lattice type data with ggplot2 and then superimpose a normal distribution over the sample data to illustrate how far off normal the underlying data is. I would like to have the normal dist on top to have the same mean and stdev as the panel.

here's an example:

library(ggplot2)

#make some example data
dd<-data.frame(matrix(rnorm(144, mean=2, sd=2),72,2),c(rep("A",24),rep("B",24),rep("C",24)))
colnames(dd) <- c("x_value", "Predicted_value",  "State_CD")

#This works
pg <- ggplot(dd) + geom_density(aes(x=Predicted_value)) +  facet_wrap(~State_CD)
print(pg)

That all works great and produces a nice three panel graph of the data. How do I add the normal dist on top? It seems I would use stat_function, but this fails:

#this fails
pg <- ggplot(dd) + geom_density(aes(x=Predicted_value)) + stat_function(fun=dnorm) +  facet_wrap(~State_CD)
print(pg)

It appears that the stat_function is not getting along with the facet_wrap feature. How do I get these two to play nicely?

------------EDIT---------

I tried to integrate ideas from two of the answers below and I am still not there:

using a combination of both answers I can hack together this:

library(ggplot)
library(plyr)

#make some example data
dd<-data.frame(matrix(rnorm(108, mean=2, sd=2),36,2),c(rep("A",24),rep("B",24),rep("C",24)))
colnames(dd) <- c("x_value", "Predicted_value",  "State_CD")

DevMeanSt <- ddply(dd, c("State_CD"), function(df)mean(df$Predicted_value)) 
colnames(DevMeanSt) <- c("State_CD", "mean")
DevSdSt <- ddply(dd, c("State_CD"), function(df)sd(df$Predicted_value) )
colnames(DevSdSt) <- c("State_CD", "sd")
DevStatsSt <- merge(DevMeanSt, DevSdSt)

pg <- ggplot(dd, aes(x=Predicted_value))
pg <- pg + geom_density()
pg <- pg + stat_function(fun=dnorm, colour='red', args=list(mean=DevStatsSt$mean, sd=DevStatsSt$sd))
pg <- pg + facet_wrap(~State_CD)
print(pg)

which is really close... except something is wrong with the normal dist plotting:

what am I doing wrong here?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

长途伴 2024-08-11 00:40:52

最初作为这个问题的答案发布，我也被鼓励在这里分享我的解决方案。

我也对将理论密度叠加在经验数据上感到沮丧，所以我编写了一个函数来自动化这个过程。自从 2009 年这个问题首次提出以来，ggplot2 极大地扩展了可扩展性，所以我将它放在 github 上的扩展包中（编辑：你现在可以在 CRAN 上找到它）。

library(ggplot2)
library(ggh4x)

set.seed(0)

# Make the example data
dd <- data.frame(matrix(rnorm(144, mean=2, sd=2),72,2),
                 c(rep("A",24),rep("B",24),rep("C",24)))
colnames(dd) <- c("x_value", "Predicted_value",  "State_CD")

ggplot(dd, aes(Predicted_value)) +
  geom_density() +
  stat_theodensity(colour = "red") +
  facet_wrap(~ State_CD)

^{由 reprex 包 (v0.3.0)}

Orginally posted as an answer to this question, I was encouraged to share my solution here too.

I too became frustrated with overlaying theoretical densities over empirical data, so I wrote a function that automated this process. Since 2009 when this question was first posed, ggplot2 has greatly expanded the extensibility, so I've put it in a extension package on github (EDIT: you can find it on CRAN now).

library(ggplot2)
library(ggh4x)

set.seed(0)

# Make the example data
dd <- data.frame(matrix(rnorm(144, mean=2, sd=2),72,2),
                 c(rep("A",24),rep("B",24),rep("C",24)))
colnames(dd) <- c("x_value", "Predicted_value",  "State_CD")

ggplot(dd, aes(Predicted_value)) +
  geom_density() +
  stat_theodensity(colour = "red") +
  facet_wrap(~ State_CD)

^{Created on 2021-01-28 by the reprex package (v0.3.0)}

回复收藏 0 原文

孤芳又自赏 2024-08-11 00:40:52

如果您愿意使用 ggformula，那么这非常简单。（也可以混合搭配并使用 ggformula 仅用于分布覆盖，但我将说明完整的 ggformula 方法。）

library(ggformula)
theme_set(theme_bw())

gf_dens( ~ Sepal.Length | Species, data = iris) %>%
  gf_fitdistr(color = "red") %>% 
  gf_fitdistr(dist = "gamma", color = "blue")

^{由 reprex 包 (v0.2.1) 于 2019 年 1 月 15 日创建}

If you are willing to use ggformula, then this is pretty easy. (It is also possible to mix and match and use ggformula just for the distribution overlay, but I'll illustrate the full on ggformula approach.)

library(ggformula)
theme_set(theme_bw())

gf_dens( ~ Sepal.Length | Species, data = iris) %>%
  gf_fitdistr(color = "red") %>% 
  gf_fitdistr(dist = "gamma", color = "blue")

^{Created on 2019-01-15 by the reprex package (v0.2.1)}

回复收藏 0 原文

泪冰清 2024-08-11 00:40:52

我认为你需要提供更多信息。这似乎有效：

 pg <- ggplot(dd, aes(Predicted_value)) ## need aesthetics in the ggplot
 pg <- pg + geom_density() 
 ## gotta provide the arguments of the dnorm
 pg <- pg + stat_function(fun=dnorm, colour='red',            
            args=list(mean=mean(dd$Predicted_value), sd=sd(dd$Predicted_value)))
 ## wrap it!
 pg <- pg + facet_wrap(~State_CD)
 pg

我们为每个面板提供相同的均值和标准差参数。获取小组特定的平均值和标准偏差留给读者作为练习*;)

“*”换句话说，不确定如何完成......

I think you need to provide more information. This seems to work:

 pg <- ggplot(dd, aes(Predicted_value)) ## need aesthetics in the ggplot
 pg <- pg + geom_density() 
 ## gotta provide the arguments of the dnorm
 pg <- pg + stat_function(fun=dnorm, colour='red',            
            args=list(mean=mean(dd$Predicted_value), sd=sd(dd$Predicted_value)))
 ## wrap it!
 pg <- pg + facet_wrap(~State_CD)
 pg

We are providing the same mean and sd parameter for every panel. Getting panel specific means and standard deviations is left as an exercise to the reader* ;)

'*' In other words, not sure how it can be done...

回复收藏 0 原文

尘世孤行 2024-08-11 00:40:52

如果您不想“手动”生成正态分布线图，仍然使用 stat_function 并并排显示图表 - 那么您可以考虑使用“Cookbook for R”上发布的“multiplot”函数作为facet_wrap 的替代品。您可以从此处将多图代码复制到您的项目中。

复制代码后，请执行以下操作：

# Some fake data (copied from hadley's answer)
dd <- data.frame(
  predicted = rnorm(72, mean = 2, sd = 2),
  state = rep(c("A", "B", "C"), each = 24)
) 

# Split the data by state, apply a function on each member that converts it into a 
# plot object, and return the result as a vector.
plots <- lapply(split(dd,dd$state),FUN=function(state_slice){ 
  # The code here is the plot code generation. You can do anything you would 
  # normally do for a single plot, such as calling stat_function, and you do this 
  # one slice at a time.
  ggplot(state_slice, aes(predicted)) + 
    geom_density() + 
    stat_function(fun=dnorm, 
                  args=list(mean=mean(state_slice$predicted), 
                            sd=sd(state_slice$predicted)),
                  color="red")
})

# Finally, present the plots on 3 columns.
multiplot(plotlist = plots, cols=3)

If you don't want to generate the normal distribution line-graph "by hand", still use stat_function, and show graphs side-by-side -- then you could consider using the "multiplot" function published on "Cookbook for R" as an alternative to facet_wrap. You can copy the multiplot code to your project from here.

After you copy the code, do the following:

# Some fake data (copied from hadley's answer)
dd <- data.frame(
  predicted = rnorm(72, mean = 2, sd = 2),
  state = rep(c("A", "B", "C"), each = 24)
) 

# Split the data by state, apply a function on each member that converts it into a 
# plot object, and return the result as a vector.
plots <- lapply(split(dd,dd$state),FUN=function(state_slice){ 
  # The code here is the plot code generation. You can do anything you would 
  # normally do for a single plot, such as calling stat_function, and you do this 
  # one slice at a time.
  ggplot(state_slice, aes(predicted)) + 
    geom_density() + 
    stat_function(fun=dnorm, 
                  args=list(mean=mean(state_slice$predicted), 
                            sd=sd(state_slice$predicted)),
                  color="red")
})

# Finally, present the plots on 3 columns.
multiplot(plotlist = plots, cols=3)

回复收藏 0 原文

林空鹿饮溪 2024-08-11 00:40:52

我认为最好的选择是使用 geom_line 手动绘制线条。

dd<-data.frame(matrix(rnorm(144, mean=2, sd=2),72,2),c(rep("A",24),rep("B",24),rep("C",24)))
colnames(dd) <- c("x_value", "Predicted_value",  "State_CD")
dd$Predicted_value<-dd$Predicted_value*as.numeric(dd$State_CD) #make different by state

##Calculate means and standard deviations by level
means<-as.numeric(by(dd[,2],dd$State_CD,mean))
sds<-as.numeric(by(dd[,2],dd$State_CD,sd))

##Create evenly spaced evaluation points +/- 3 standard deviations away from the mean
dd$vals<-0
for(i in 1:length(levels(dd$State_CD))){
    dd$vals[dd$State_CD==levels(dd$State_CD)[i]]<-seq(from=means[i]-3*sds[i], 
                            to=means[i]+3*sds[i],
                            length.out=sum(dd$State_CD==levels(dd$State_CD)[i]))
}
##Create normal density points
dd$norm<-with(dd,dnorm(vals,means[as.numeric(State_CD)],
                        sds[as.numeric(State_CD)]))


pg <- ggplot(dd, aes(Predicted_value)) 
pg <- pg + geom_density() 
pg <- pg + geom_line(aes(x=vals,y=norm),colour="red") #Add in normal distribution
pg <- pg + facet_wrap(~State_CD,scales="free")
pg

I think your best bet is to draw the line manually with geom_line.

dd<-data.frame(matrix(rnorm(144, mean=2, sd=2),72,2),c(rep("A",24),rep("B",24),rep("C",24)))
colnames(dd) <- c("x_value", "Predicted_value",  "State_CD")
dd$Predicted_value<-dd$Predicted_value*as.numeric(dd$State_CD) #make different by state

##Calculate means and standard deviations by level
means<-as.numeric(by(dd[,2],dd$State_CD,mean))
sds<-as.numeric(by(dd[,2],dd$State_CD,sd))

##Create evenly spaced evaluation points +/- 3 standard deviations away from the mean
dd$vals<-0
for(i in 1:length(levels(dd$State_CD))){
    dd$vals[dd$State_CD==levels(dd$State_CD)[i]]<-seq(from=means[i]-3*sds[i], 
                            to=means[i]+3*sds[i],
                            length.out=sum(dd$State_CD==levels(dd$State_CD)[i]))
}
##Create normal density points
dd$norm<-with(dd,dnorm(vals,means[as.numeric(State_CD)],
                        sds[as.numeric(State_CD)]))


pg <- ggplot(dd, aes(Predicted_value)) 
pg <- pg + geom_density() 
pg <- pg + geom_line(aes(x=vals,y=norm),colour="red") #Add in normal distribution
pg <- pg + facet_wrap(~State_CD,scales="free")
pg

回复收藏 0 原文

征棹 2024-08-11 00:40:51

stat_function 旨在在每个面板中覆盖相同的函数。（没有明显的方法可以将函数的参数与不同的面板相匹配）。

正如 Ian 所建议的，最好的方法是自己生成正态曲线，并将它们绘制为一个单独的数据集（这是您之前出错的地方 - 合并对于这个示例来说没有意义，并且如果你仔细观察，你会发现这就是为什么你会得到奇怪的锯齿图案）。

以下是我解决问题的方法：

dd <- data.frame(
  predicted = rnorm(72, mean = 2, sd = 2),
  state = rep(c("A", "B", "C"), each = 24)
) 

grid <- with(dd, seq(min(predicted), max(predicted), length = 100))
normaldens <- ddply(dd, "state", function(df) {
  data.frame( 
    predicted = grid,
    density = dnorm(grid, mean(df$predicted), sd(df$predicted))
  )
})

ggplot(dd, aes(predicted))  + 
  geom_density() + 
  geom_line(aes(y = density), data = normaldens, colour = "red") +
  facet_wrap(~ state)

stat_function is designed to overlay the same function in every panel. (There's no obvious way to match up the parameters of the function with the different panels).

As Ian suggests, the best way is to generate the normal curves yourself, and plot them as a separate dataset (this is where you were going wrong before - merging just doesn't make sense for this example and if you look carefully you'll see that's why you're getting the strange sawtooth pattern).

Here's how I'd go about solving the problem:

dd <- data.frame(
  predicted = rnorm(72, mean = 2, sd = 2),
  state = rep(c("A", "B", "C"), each = 24)
) 

grid <- with(dd, seq(min(predicted), max(predicted), length = 100))
normaldens <- ddply(dd, "state", function(df) {
  data.frame( 
    predicted = grid,
    density = dnorm(grid, mean(df$predicted), sd(df$predicted))
  )
})

ggplot(dd, aes(predicted))  + 
  geom_density() + 
  geom_line(aes(y = density), data = normaldens, colour = "red") +
  facet_wrap(~ state)