在 ggplot2 / R 中添加指数 geom_smooth

发布于 2024-09-15 10:08:41 字数 981 浏览 7 评论 0原文

我正在尝试使用 ggplot2 生成一些示例图形,我选择的示例之一是 生日问题,这里使用的是从 Revolution 计算演示中“借用”的代码在奥斯康。

birthday<-function(n){
    ntests<-1000
    pop<-1:365
    anydup<-function(i){
        any(duplicated(sample(pop,n,replace=TRUE)))
        }
    sum(sapply(seq(ntests), anydup))/ntests
    }

x<-data.frame(x=rep(1:100, each=5)) 
x<-ddply(x, .(x), function(df) {return(data.frame(x=df$x, prob=birthday(df$x)))})
birthdayplot<-ggplot(x, aes(x, prob))+
        geom_point()+geom_smooth()+
        theme_bw()+
        opts(title = "Probability that at least two people share a birthday in a random group")+
        labs(x="Size of Group", y="Probability")

这里我的图表是我所描述的指数图表,但 geom_smooth 不太适合数据。我尝试过黄土方法,但这并没有太大改变。谁能建议如何添加更好的平滑度?

谢谢

保罗。

I am trying to produce some example graphics using ggplot2, and one of the examples I picked was the birthday problem, here using code 'borrowed' from a Revolution computing presentation at Oscon.

birthday<-function(n){
    ntests<-1000
    pop<-1:365
    anydup<-function(i){
        any(duplicated(sample(pop,n,replace=TRUE)))
        }
    sum(sapply(seq(ntests), anydup))/ntests
    }

x<-data.frame(x=rep(1:100, each=5)) 
x<-ddply(x, .(x), function(df) {return(data.frame(x=df$x, prob=birthday(df$x)))})
birthdayplot<-ggplot(x, aes(x, prob))+
        geom_point()+geom_smooth()+
        theme_bw()+
        opts(title = "Probability that at least two people share a birthday in a random group")+
        labs(x="Size of Group", y="Probability")

Here my graph is what I would describe as exponential, but the geom_smooth doesn't fit the data particularly well. I've tried the loess method but this didn't change things much. Can anyone suggest how to add a better smooth ?

Thanks

Paul.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

静若繁花 2024-09-22 10:08:41

平滑例程无法足够快地对 x 的低值的突然变化做出反应(并且它无法知道 prob 的值被限制为 0- 1 个范围)。由于变异性如此之低,一个快速的解决方案是减少在每个点进行平滑的值的跨度。查看该图中的红线:

birthdayplot + geom_smooth(span=0.1, colour="red")

The smoothing routine does not react to the sudden change for low values of x fast enough (and it has no way of knowing that the values of prob are restricted to a 0-1 range). Since you have so low variability, a quick solution is to reduce the span of values over which smoothing at each point is done. Check out the red line in this plot:

birthdayplot + geom_smooth(span=0.1, colour="red")
偏爱你一生 2024-09-22 10:08:41

问题在于概率遵循逻辑曲线。如果您更改生日函数以返回原始的成功和失败而不是概率,则可以拟合适当的平滑线。

birthday<-function(n){
  ntests<-1000
  pop<-1:365
  anydup<-function(i){
    any(duplicated(sample(pop,n,replace=TRUE)))
  }
  data.frame(Dups = sapply(seq(ntests), anydup) * 1, n = n)
}
x<-ddply(x, .(x),function(df) birthday(df$x))

现在,您必须添加点作为摘要,并指定逻辑回归作为平滑类型。

ggplot(x, aes(n, Dups)) +
  stat_summary(fun.y = mean, geom = "point") +
  stat_smooth(method = "glm", family = binomial)

The problem is that the probabilities follow a logistic curve. You could fit a proper smoothing line if you change the birthday function to return the raw successes and failures instead of the probabilities.

birthday<-function(n){
  ntests<-1000
  pop<-1:365
  anydup<-function(i){
    any(duplicated(sample(pop,n,replace=TRUE)))
  }
  data.frame(Dups = sapply(seq(ntests), anydup) * 1, n = n)
}
x<-ddply(x, .(x),function(df) birthday(df$x))

Now, you'll have to add the points as a summary, and specify a logistic regression as the smoothing type.

ggplot(x, aes(n, Dups)) +
  stat_summary(fun.y = mean, geom = "point") +
  stat_smooth(method = "glm", family = binomial)
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文