r dataframe data-manipulation data-management

对每个变量使用不同的函数按组折叠数据框

发布于 2024-11-17 11:29:10 字数 783 浏览 7 评论 0原文

定义

df<-read.table(textConnection('egg 1 20 a
                        egg 2 30 a
                        jap 3 50 b
                        jap 1 60 b'))

st.

> df
   V1 V2 V3 V4
1 egg  1 20  a
2 egg  2 30  a
3 jap  3 50  b
4 jap  1 60  b

我的数据没有因子，所以我将因子转换为字符：

> df$V1 <- as.character(df$V1)
> df$V4 <- as.character(df$V4)

我想通过 V1 保持“折叠”数据框：

V2 的最大值
V3 的平均值
V4 的模式（该值实际上在 V1 组内不会改变，所以第一个、最后一个等也可能这样做。）

请注意这是一个一般性问题，例如我的数据集更大，我可能想使用不同的函数（例如最后一个、第一个、最小值、最大值、方差、st.dev 等）对于不同的变量）崩溃时。因此，函数参数可能会很长。

在这种情况下，我想要以下形式的输出：

> df.collapse
   V1 V2 V3 V4
1 egg  2 25  a
2 jap  3 55  b

原文

Define

df<-read.table(textConnection('egg 1 20 a
                        egg 2 30 a
                        jap 3 50 b
                        jap 1 60 b'))

s.t.

> df
   V1 V2 V3 V4
1 egg  1 20  a
2 egg  2 30  a
3 jap  3 50  b
4 jap  1 60  b

My data has no factors so I convert factors to characters:

> df$V1 <- as.character(df$V1)
> df$V4 <- as.character(df$V4)

I would like to "collapse" the data frame by V1 keeping:

The max of V2
The mean of V3
The mode of V4 (this value does not actually change within V1 groups, so first, last, etc might do also.)

Please note this is a general question, e.g. my dataset is much larger and I may want to use different functions (e.g. last, first, min, max, variance, st. dev., etc for different variables) when collapsing. Hence the functions argument could be quite long.

In this case I would want output of the form:

> df.collapse
   V1 V2 V3 V4
1 egg  2 25  a
2 jap  3 55  b

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

谜泪 2024-11-24 11:29:10

plyr 包会帮助你：

library(plyr)
ddply(df, .(V1), summarize, V2 = max(V2), V3 = mean(V3), V4 = toupper(V4)[1])

由于 R 没有模式函数（可能），我放置了其他函数。
但实现模式功能很容易。

plyr package will help you:

library(plyr)
ddply(df, .(V1), summarize, V2 = max(V2), V3 = mean(V3), V4 = toupper(V4)[1])

As R does not have mode function (probably), I put other function.
But it is easy to implement a mode function.

回复收藏 0 原文

吻泪 2024-11-24 11:29:10

我建议使用 plyr 中的 ddply：

require(plyr)
ddply(df, .(V1), summarise, V2=max(V2), V3=mean(V3), V4=V4[1])

您可以用您希望的任何计算替换函数。您的 V3 列是非数字的，因此可能需要将其转换为数字，然后计算众数。现在我只是返回每个分割的第一行的 V3 值。或者，如果您不想使用 plyr：

do.call(rbind, lapply(split(df, df$V1), function(x) {
    data.frame(V2=max(x$V2), V3=mean(x$V3), V4=x$V4[1]))
})

I would suggest using ddply from plyr:

require(plyr)
ddply(df, .(V1), summarise, V2=max(V2), V3=mean(V3), V4=V4[1])

You can replace the functions with any calculation you wish. Your V3 column is non-numeric so might want to convert that to a numeric and then compute the mode. For now I am just returning the V3 value of the first row for each of the splits. Or if you don't want to use plyr:

do.call(rbind, lapply(split(df, df$V1), function(x) {
    data.frame(V2=max(x$V2), V3=mean(x$V3), V4=x$V4[1]))
})

回复收藏 0 原文

~没有更多了~

关于作者

不美如何

暂无简介

文章

27 人气

关注发私信

友情链接

文江博客

对每个变量使用不同的函数按组折叠数据框

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

佚名

羁客

天天爱笑的徐老师

星

夏日落

隐诗

友情链接

对每个变量使用不同的函数按组折叠数据框

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

佚名

羁客

天天爱笑的徐老师

星

夏日落

隐诗

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。