如何通过某些变量折叠数据框，并取其他变量的平均值

发布于 2024-08-27 20:42:36 字数 910 浏览 11 评论 0原文

我需要通过一些变量来总结数据框，而忽略其他变量。这有时被称为崩溃。例如，如果我有一个像这样的数据框：

Widget Type Energy  
egg 1 20  
egg 2 30  
jap 3 50  
jap 1 60

然后通过 Widget 折叠，并使用 Energy 作为因变量 Energy~Widget，将产生

Widget Energy  
egg  25  
jap  55

在 Excel 中最接近的功能可能是“数据透视表”，我已经弄清楚如何在 python 中执行此操作（ http://alexholcombe。 wordpress.com/2009/01/26/summarizing-data-by-combinations-of-variables-with-python/），这是一个 R 使用 doBy 库做一些非常相关的事情的示例（ http://www.mail-archive.com/[email protected]/msg02643.html)，但是有没有一种简单的方法可以实现上述操作呢？更好的是，ggplot2 库中是否内置了任何东西来创建跨某些变量崩溃的绘图？

原文

I need to summarize a data frame by some variables, ignoring the others. This is sometimes referred to as collapsing. E.g. if I have a dataframe like this:

Widget Type Energy  
egg 1 20  
egg 2 30  
jap 3 50  
jap 1 60

Then collapsing by Widget, with Energy the dependent variable, Energy~Widget, would yield

Widget Energy  
egg  25  
jap  55

In Excel the closest functionality might be "Pivot tables" and I've worked out how to do it in python ( http://alexholcombe.wordpress.com/2009/01/26/summarizing-data-by-combinations-of-variables-with-python/), and here's an example with R using doBy library to do something very related ( http://www.mail-archive.com/[email protected]/msg02643.html), but is there an easy way to do the above? And even better is there anything built into the ggplot2 library to create plots that collapse across some variables?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

恋你朝朝暮暮 2024-09-03 20:42:37

对于熟悉 SQL 的人来说，操作数据帧的另一种方法可以是 sqldf 包中的 sqldf 命令。

library(sqldf)
sqldf("SELECT Widget, avg(Energy) FROM yourDataFrame GROUP BY Widget")

For those familiar with SQL, another way to manipulate dataframes can be the sqldf command in the sqldf package.

library(sqldf)
sqldf("SELECT Widget, avg(Energy) FROM yourDataFrame GROUP BY Widget")

回复收藏 0 原文

不奢求什么 2024-09-03 20:42:37

@Jyotirmoy 提到这可以使用 plyr 库来完成。样子：

DF <- read.table(text=
"Widget Type Energy  
egg 1 20  
egg 2 30  
jap 3 50  
jap 1 60", header=TRUE)

library("plyr")
ddply(DF, .(Widget), summarise, Energy=mean(Energy))

这就是它的

> ddply(DF, .(Widget), summarise, Energy=mean(Energy))
  Widget Energy
1    egg     25
2    jap     55

@Jyotirmoy mentioned that this can be done with the plyr library. Here is what that would look like:

DF <- read.table(text=
"Widget Type Energy  
egg 1 20  
egg 2 30  
jap 3 50  
jap 1 60", header=TRUE)

library("plyr")
ddply(DF, .(Widget), summarise, Energy=mean(Energy))

which gives

> ddply(DF, .(Widget), summarise, Energy=mean(Energy))
  Widget Energy
1    egg     25
2    jap     55

回复收藏 0 原文

手心的温暖 2024-09-03 20:42:36

使用aggregate 来总结一个因素：

> df<-read.table(textConnection('
+ egg 1 20
+ egg 2 30
+ jap 3 50
+ jap 1 60'))
> aggregate(df$V3,list(df$V1),mean)
  Group.1  x
1     egg 25
2     jap 55

要获得更大的灵活性，请查看tapply 函数和plyr 包。

在 ggplot2 中使用 stat_summary 进行总结

qplot(V1,V3,data=df,stat="summary",fun.y=mean,geom='bar',width=0.4)

Use aggregate to summarize across a factor:

> df<-read.table(textConnection('
+ egg 1 20
+ egg 2 30
+ jap 3 50
+ jap 1 60'))
> aggregate(df$V3,list(df$V1),mean)
  Group.1  x
1     egg 25
2     jap 55

For more flexibility look at the tapply function and the plyr package.

In ggplot2 use stat_summary to summarize

qplot(V1,V3,data=df,stat="summary",fun.y=mean,geom='bar',width=0.4)

回复收藏 0 原文

~没有更多了~

关于作者

蒗幽

暂无简介

文章

27 人气

关注发私信

微信用户

文章 0 评论 0

关注

夜夜流光相皎洁

文章 0 评论 0

关注

零度℉

文章 0 评论 0

关注

百度③文鱼

文章 0 评论 0

关注

qq_O3Ao6frw

文章 0 评论 0

关注

Wugswg

文章 0 评论 0

友情链接

文江博客

如何通过某些变量折叠数据框，并取其他变量的平均值

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（3）

关于作者

相关话题

热门标签

推荐作者

微信用户

夜夜流光相皎洁

零度℉

百度③文鱼

qq_O3Ao6frw

Wugswg

友情链接

如何通过某些变量折叠数据框，并取其他变量的平均值

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（3）

关于作者

相关话题

热门标签

推荐作者

微信用户

夜夜流光相皎洁

零度℉

百度③文鱼

qq_O3Ao6frw

Wugswg

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。