R 中的数据聚合循环
我面临着将数据聚合到日常数据的问题。 我有一个数据框,其中 NA 已被删除(下面给出了数据图片的链接)。每天收集3次数据,但有时由于NA的原因,每天只有1或2条记录;有时数据完全丢失。
我现在有兴趣计算“dist”的每日平均值:这意味着将一天的“dist”数据相加,然后除以每天的条目数(如果没有,则为 3)当天数据丢失)。我想通过循环来做到这一点。 我怎样才能用循环来做到这一点?问题是,有时我每天有 3 个条目,有时只有 2 个甚至 1 个。我想告诉 R 每天,它应该总结“dist” > 并除以每天可用的条目数。
我只是不知道如何为此目的制定 for 循环。如果您能为我提供有关该问题的任何建议,我将不胜感激。感谢您的努力和亲切的问候,
Jan
数据框架:http://www .pic-upload.de/view-11435581/Data_loop.jpg.html
编辑:我按照建议使用了聚合和tapply,但是,数据的平均值并未真正计算:
Group.1 x
1 2006-10-06 12:00:00 636.5395
2 2006-10-06 20:00:00 859.0109
3 2006-10-07 04:00:00 301.8548
4 2006-10-07 12:00:00 649.3357
5 2006-10-07 20:00:00 944.8272
6 2006-10-08 04:00:00 136.7393
7 2006-10-08 12:00:00 360.9560
8 2006-10-08 20:00:00 NaN
使用的代码是:
dates<-Dis_sub$date
distance<-Dis_sub$dist
aggregate(distance,list(dates),mean,na.rm=TRUE)
tapply(distance,dates,mean,na.rm=TRUE)
I am facing a problem concerning aggregating my data to daily data.
I have a data frame where NAs have been removed (Link of picture of data is given below). Data has been collected 3 times a day, but sometimes due to NAs, there is just 1 or 2 entries per day; some days data is missing completely.
I am now interested in calculating the daily mean of "dist": this means summing up the data of "dist" of one day and dividing it by number of entries per day (so 3 if there is no data missing that day). I would like to do this via a loop.
How can I do this with a loop? The problem is that sometimes I have 3 entries per day and sometimes just 2 or even 1. I would like to tell R that for every day, it should sum up "dist" and divide it by the number of entries that are available for every day.
I just have no idea how to formulate a for loop for this purpose. I would really appreciate if you could give me any advice on that problem. Thanks for your efforts and kind regards,
Jan
Data frame: http://www.pic-upload.de/view-11435581/Data_loop.jpg.html
Edit: I used aggregate and tapply as suggested, however, the mean value of the data was not really calculated:
Group.1 x
1 2006-10-06 12:00:00 636.5395
2 2006-10-06 20:00:00 859.0109
3 2006-10-07 04:00:00 301.8548
4 2006-10-07 12:00:00 649.3357
5 2006-10-07 20:00:00 944.8272
6 2006-10-08 04:00:00 136.7393
7 2006-10-08 12:00:00 360.9560
8 2006-10-08 20:00:00 NaN
The code used was:
dates<-Dis_sub$date
distance<-Dis_sub$dist
aggregate(distance,list(dates),mean,na.rm=TRUE)
tapply(distance,dates,mean,na.rm=TRUE)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
不要使用循环。使用 R。一些示例数据 :
和任何 :
都会给你你想要的。另请参阅
?aggregate
和?tapply
。如果您想要返回数据帧,可以查看包
plyr
:请记住,
ddply
尚未完全支持日期格式。Don't use a loop. Use R. Some example data :
and any of :
gives you what you want. See also
?aggregate
and?tapply
.If you want a dataframe back, you can look at the package
plyr
:Keep in mind that
ddply
is not fully supporting the date format (yet).查看
data.table
包,尤其是当您的数据很大时。下面是一些按天
计算dist
平均值的代码。Look at the
data.table
package especially if your data is huge. Here is some code that calculates the mean ofdist
byday
.看来您的主要问题是您的
date
字段附加了时间。您需要做的第一件事是创建一个仅包含日期的列,然后使用 Joris Meys 的解决方案(这是正确的方法)应该可以工作。
但是,如果由于某种原因您确实想要使用循环,您可以尝试类似的方法
,但请记住,这会很慢并且内存效率低下。
It looks like your main problem is that your
date
field has times attached. The first thing you need to do is create a column that has just the date using something likeThen using Joris Meys' solution (which is the right way to do it) should work.
However if for some reason you really want to use a loop you could try something like
But keep in mind that this will be slow and memory-inefficient.