在 ggplot2 / R 中添加指数 geom_smooth
我正在尝试使用 ggplot2 生成一些示例图形,我选择的示例之一是 生日问题,这里使用的是从 Revolution 计算演示中“借用”的代码在奥斯康。
birthday<-function(n){
ntests<-1000
pop<-1:365
anydup<-function(i){
any(duplicated(sample(pop,n,replace=TRUE)))
}
sum(sapply(seq(ntests), anydup))/ntests
}
x<-data.frame(x=rep(1:100, each=5))
x<-ddply(x, .(x), function(df) {return(data.frame(x=df$x, prob=birthday(df$x)))})
birthdayplot<-ggplot(x, aes(x, prob))+
geom_point()+geom_smooth()+
theme_bw()+
opts(title = "Probability that at least two people share a birthday in a random group")+
labs(x="Size of Group", y="Probability")
这里我的图表是我所描述的指数图表,但 geom_smooth 不太适合数据。我尝试过黄土方法,但这并没有太大改变。谁能建议如何添加更好的平滑度?
谢谢
保罗。
I am trying to produce some example graphics using ggplot2, and one of the examples I picked was the birthday problem, here using code 'borrowed' from a Revolution computing presentation at Oscon.
birthday<-function(n){
ntests<-1000
pop<-1:365
anydup<-function(i){
any(duplicated(sample(pop,n,replace=TRUE)))
}
sum(sapply(seq(ntests), anydup))/ntests
}
x<-data.frame(x=rep(1:100, each=5))
x<-ddply(x, .(x), function(df) {return(data.frame(x=df$x, prob=birthday(df$x)))})
birthdayplot<-ggplot(x, aes(x, prob))+
geom_point()+geom_smooth()+
theme_bw()+
opts(title = "Probability that at least two people share a birthday in a random group")+
labs(x="Size of Group", y="Probability")
Here my graph is what I would describe as exponential, but the geom_smooth doesn't fit the data particularly well. I've tried the loess method but this didn't change things much. Can anyone suggest how to add a better smooth ?
Thanks
Paul.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
平滑例程无法足够快地对
x
的低值的突然变化做出反应(并且它无法知道prob
的值被限制为 0- 1 个范围)。由于变异性如此之低,一个快速的解决方案是减少在每个点进行平滑的值的跨度。查看该图中的红线:The smoothing routine does not react to the sudden change for low values of
x
fast enough (and it has no way of knowing that the values ofprob
are restricted to a 0-1 range). Since you have so low variability, a quick solution is to reduce the span of values over which smoothing at each point is done. Check out the red line in this plot:问题在于概率遵循逻辑曲线。如果您更改生日函数以返回原始的成功和失败而不是概率,则可以拟合适当的平滑线。
现在,您必须添加点作为摘要,并指定逻辑回归作为平滑类型。
The problem is that the probabilities follow a logistic curve. You could fit a proper smoothing line if you change the birthday function to return the raw successes and failures instead of the probabilities.
Now, you'll have to add the points as a summary, and specify a logistic regression as the smoothing type.