计算行的平均值
我有一个名为 ants
的数据框,详细说明了每个站点的多个条目,如下所示:
Site Date Time Temp SpCond Salinity Depth Turbidity Chlorophyll
1 71 6/8/2010 14:50:35 14.32 49.88 32.66 0.397 0.0 1.3
2 71 6/8/2010 14:51:00 14.31 49.94 32.70 1.073 0.0 2.0
3 71 6/8/2010 14:51:16 14.32 49.95 32.71 1.034 -0.1 1.6
4 71 6/8/2010 14:51:29 14.31 49.96 32.71 1.030 -0.2 1.6
5 70 6/8/2010 14:53:55 14.30 50.04 32.77 1.002 -0.2 1.2
6 70 6/8/2010 14:54:09 14.30 50.03 32.77 0.993 -0.5 1.2
站点具有不同数量的条目,通常为 3 个,但有时更少或更多。在日期和站点编号匹配的情况下,我想编写一个新的数据框,每个站点有一个条目,详细说明每个参数的平均/平均读数。我希望从计算和后续数据框中省略空或“na”单元格。
我不确定这是一个 apply 函数还是 rowMeans 的一个版本?非常困难,非常感谢任何帮助!
I have a dataframe called ants
detailing multiple entries per site, looks like this:
Site Date Time Temp SpCond Salinity Depth Turbidity Chlorophyll
1 71 6/8/2010 14:50:35 14.32 49.88 32.66 0.397 0.0 1.3
2 71 6/8/2010 14:51:00 14.31 49.94 32.70 1.073 0.0 2.0
3 71 6/8/2010 14:51:16 14.32 49.95 32.71 1.034 -0.1 1.6
4 71 6/8/2010 14:51:29 14.31 49.96 32.71 1.030 -0.2 1.6
5 70 6/8/2010 14:53:55 14.30 50.04 32.77 1.002 -0.2 1.2
6 70 6/8/2010 14:54:09 14.30 50.03 32.77 0.993 -0.5 1.2
Sites have different numbers of entries, usually 3 but sometimes less or more. Where both date and site number match I would like to write a new dataframe with one entry per site detailing the average/mean readings for each parameter. I would like empty or "na" cells to be omitted from the calculation and subsequent dataframe.
I'm not sure if this is an apply function or a version of rowMeans maybe? Very stuck, any help much appreciated!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
Nico 的答案看起来就像我的答案,只是我会添加一个命名参数传递给mean(),这样NA(在聚合列中)就不会破坏结果。 (我无法判断OP是否要求by变量或其他变量中的NA已知或怀疑具有NA):
您可能还需要并行运行聚合或tapply调用来计算非-的数量NA 值。
使用聚合公式方法的其他方法可能有所不同,因为 na.action=na.omit 是默认值:
Nico's answer looked like mine would have except that I would have added a named argument to be passed to mean() so that the NA's (in the aggregated columns) would not sabotage the results. (I could not tell whether the OP was asking that NA's in the by variables or in the otehr variables were known or suspected of having NA's) :
You would probably need to also run aggregate or tapply calls in parallel to count the number of non-NA values.
The other method using aggregate's formula method might be different since na.action=na.omit is the default:
这是使用 plyr 包及其
ddply()
函数:我使用自定义匿名函数来跳过前三列。
Here is one way using the plyr package and its
ddply()
function:I used a custom anonymous function to skip the first three columns.
您还可以使用
聚合
You can also use
aggregate
这是一个完整的新答案,其中包含完整的日志,还涵盖您的新规范:
Here is a complete new answer with a full log also covering your new specification:
您已经接近
rowMeans()
,但您需要colMeans()
。其他人已经展示了如何使用内置或附加功能,我当然会建议您使用它们。但是,了解如何手动执行类似的操作可能会很有用:如果您希望输出像其他人的答案一样好,那么此时我们需要进行一些额外的整理:
再次注意 ,在大多数情况下,您应该使用其他答案所述的固定函数。然而,有时编写自己的解决方案可能会更快,以上内容可能可以作为实现这一目标的指南。
You were close with
rowMeans()
, but you needcolMeans()
instead. The others have shown how to use built-in or add-on functionality and I would certainly recommend you use them. However, it might be useful to see how to do something like this by hand:At this point we need to do some extra tidying if you want the output to be nice like the others' answers:
Note again, for most cases you should use the canned functions as described by the other answers. Sometimes it might be quicker to cook your own solution however, and the above might act as a guide to achieving this.