ddply 返回太多结果

发布于 2024-11-01 18:12:18 字数 2611 浏览 1 评论 0原文

由于某种原因,自从升级到 R-2.13.0 以及升级到 plyr_1.5.1.tar.gz 以来,我得到的结果比我预期的要多...我在旧版本的 plyr 上尝试了这个(不幸的是版本不确定,因为我刚刚覆盖了它...)

library(plyr)
dd <-data.frame(matrix(rnorm(216),72,3),c(rep("A",24),rep("B",24),
  rep("C",24)),c(rep("J",36),rep("K",36)))
colnames(dd) <- c("v1", "v2", "v3", "dim1", "dim2")

results1 <- ddply(dd, c("dim1","dim2"), function(df) c(m1=mean(df$v1)) )
results2 <- ddply(dd, c("dim1","dim2"), function(df) { c(m1=mean(df$v1),
    m2=mean(df$v2)) } )
results3 <- ddply(dd, c("dim1","dim2"), function(df) { c(m1=mean(df$v1),
    m2=mean(df$v2), m3=mean(df$v3)) } )

我不明白为什么结果 2 的行数是结果 1 中的行数的两倍,而结果 3 的行数是结果 3 的三倍 - 其中原始结果 1 只是复制了两次或三次。

我有一份 R 版本 2.11.0 Patched (2010-05-01 r51907) 的方便副本,使用旧版本的 plyr,我期望的结果是......

> results1
  dim1 dim2          m1
1    A    J  0.07312783
2    B    J -0.22428746
3    B    K -0.44205832
4    C    K  0.21421456
> results2
  dim1 dim2          m1         m2
1    A    J  0.07312783 -0.1130148
2    B    J -0.22428746  0.4394832
3    B    K -0.44205832 -0.1934018
4    C    K  0.21421456 -0.0178809
> results3
  dim1 dim2          m1         m2          m3
1    A    J  0.07312783 -0.1130148 -0.03175873
2    B    J -0.22428746  0.4394832  0.21581696
3    B    K -0.44205832 -0.1934018 -0.28313530
4    C    K  0.21421456 -0.0178809 -0.21948430

我从 R 版本 2.13.0 (2011-04- 13)

> results1
  dim1 dim2         m1
1    A    J -0.2270726
2    B    J  0.5860493
3    B    K -0.5986129
4    C    K  0.3135809
> results2
  dim1 dim2         m1          m2
1    A    J -0.2270726 -0.19037813
2    B    J  0.5860493 -0.05385395
3    B    K -0.5986129  0.29404095
4    C    K  0.3135809 -0.26744010
5    A    J -0.2270726 -0.19037813
6    B    J  0.5860493 -0.05385395
7    B    K -0.5986129  0.29404095
8    C    K  0.3135809 -0.26744010
> results3
   dim1 dim2         m1          m2          m3
1     A    J -0.2270726 -0.19037813 -0.20448734
2     B    J  0.5860493 -0.05385395 -0.11190857
3     B    K -0.5986129  0.29404095 -0.27072101
4     C    K  0.3135809 -0.26744010 -0.03184949
5     A    J -0.2270726 -0.19037813 -0.20448734
6     B    J  0.5860493 -0.05385395 -0.11190857
7     B    K -0.5986129  0.29404095 -0.27072101
8     C    K  0.3135809 -0.26744010 -0.03184949
9     A    J -0.2270726 -0.19037813 -0.20448734
10    B    J  0.5860493 -0.05385395 -0.11190857
11    B    K -0.5986129  0.29404095 -0.27072101
12    C    K  0.3135809 -0.26744010 -0.03184949

为什么 results2 有 8 行而不是 4 行,而 results3 有 12 行而不是 4 行?

谢谢, 肖恩

For some reason I'm getting more results than I expected since the upgrade to R-2.13.0 - and the upgrade to plyr_1.5.1.tar.gz... I tried this on an old version of plyr (version unsure unfortunately as I've just overwritten it...)

library(plyr)
dd <-data.frame(matrix(rnorm(216),72,3),c(rep("A",24),rep("B",24),
  rep("C",24)),c(rep("J",36),rep("K",36)))
colnames(dd) <- c("v1", "v2", "v3", "dim1", "dim2")

results1 <- ddply(dd, c("dim1","dim2"), function(df) c(m1=mean(df$v1)) )
results2 <- ddply(dd, c("dim1","dim2"), function(df) { c(m1=mean(df$v1),
    m2=mean(df$v2)) } )
results3 <- ddply(dd, c("dim1","dim2"), function(df) { c(m1=mean(df$v1),
    m2=mean(df$v2), m3=mean(df$v3)) } )

I don't understand why results 2 has twice the number of rows in results1 and results3 has three times as many - where the original results1 is just replicated twice or three times.

I had a handy copy of R version 2.11.0 Patched (2010-05-01 r51907) using an old version of plyr the results I was expecting were...

> results1
  dim1 dim2          m1
1    A    J  0.07312783
2    B    J -0.22428746
3    B    K -0.44205832
4    C    K  0.21421456
> results2
  dim1 dim2          m1         m2
1    A    J  0.07312783 -0.1130148
2    B    J -0.22428746  0.4394832
3    B    K -0.44205832 -0.1934018
4    C    K  0.21421456 -0.0178809
> results3
  dim1 dim2          m1         m2          m3
1    A    J  0.07312783 -0.1130148 -0.03175873
2    B    J -0.22428746  0.4394832  0.21581696
3    B    K -0.44205832 -0.1934018 -0.28313530
4    C    K  0.21421456 -0.0178809 -0.21948430

The results I get from R version 2.13.0 (2011-04-13)

> results1
  dim1 dim2         m1
1    A    J -0.2270726
2    B    J  0.5860493
3    B    K -0.5986129
4    C    K  0.3135809
> results2
  dim1 dim2         m1          m2
1    A    J -0.2270726 -0.19037813
2    B    J  0.5860493 -0.05385395
3    B    K -0.5986129  0.29404095
4    C    K  0.3135809 -0.26744010
5    A    J -0.2270726 -0.19037813
6    B    J  0.5860493 -0.05385395
7    B    K -0.5986129  0.29404095
8    C    K  0.3135809 -0.26744010
> results3
   dim1 dim2         m1          m2          m3
1     A    J -0.2270726 -0.19037813 -0.20448734
2     B    J  0.5860493 -0.05385395 -0.11190857
3     B    K -0.5986129  0.29404095 -0.27072101
4     C    K  0.3135809 -0.26744010 -0.03184949
5     A    J -0.2270726 -0.19037813 -0.20448734
6     B    J  0.5860493 -0.05385395 -0.11190857
7     B    K -0.5986129  0.29404095 -0.27072101
8     C    K  0.3135809 -0.26744010 -0.03184949
9     A    J -0.2270726 -0.19037813 -0.20448734
10    B    J  0.5860493 -0.05385395 -0.11190857
11    B    K -0.5986129  0.29404095 -0.27072101
12    C    K  0.3135809 -0.26744010 -0.03184949

why has results2 got 8 rows instead of 4 and results3 got 12 rows instead of 4?

Thanks,
Sean

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

蓝眼泪 2024-11-08 18:12:18

这个问题很快就会在 plyr 1.5.2 中得到修复

This will be fixed shortly in plyr 1.5.2

谢绝鈎搭 2024-11-08 18:12:18

导致问题的是 ddply() 中的 c() 函数。

您可以通过以下三种替代方法来编写结果语句,并逐渐变得更简单:

  1. 在函数内使用 data.frame:

    ddply(dd, c("dim1","dim2"), 函数(df) {data.frame(m1=mean(df$v1),
    m2=mean(df$v2), m3=mean(df$v3)) } )

  2. 使用摘要:

    ddply(dd, .(dim1, dim2), summarise, m1=mean(v1), m2=mean(v2), m3=mean(v3))

  3. 使用 numcolwise。

    ddply(dd, .(dim1, dim2), numcolwise(mean))

在每种情况下,结果都是您所期望的:

  dim1 dim2          m1         m2          m3
1    A    J -0.04272659 -0.1468376  0.17902942
2    B    J -0.10133503 -0.1427358 -0.05241214
3    B    K  0.29698847 -0.0989732  0.14422812
4    C    K  0.04108324  0.2014864 -0.15893221

It's the c() function inside your ddply() that's causing the problem.

Here are three alternative ways that you can write your statement for results3, progressively getting simpler:

  1. Use data.frame inside your function:

    ddply(dd, c("dim1","dim2"), function(df) {data.frame(m1=mean(df$v1),
    m2=mean(df$v2), m3=mean(df$v3)) } )

  2. Use summarise:

    ddply(dd, .(dim1, dim2), summarise, m1=mean(v1), m2=mean(v2), m3=mean(v3))

  3. Use numcolwise.

    ddply(dd, .(dim1, dim2), numcolwise(mean))

In each case the results are what you would expect:

  dim1 dim2          m1         m2          m3
1    A    J -0.04272659 -0.1468376  0.17902942
2    B    J -0.10133503 -0.1427358 -0.05241214
3    B    K  0.29698847 -0.0989732  0.14422812
4    C    K  0.04108324  0.2014864 -0.15893221
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文