R 中的数据聚合循环

发布于 2024-12-05 05:40:28 字数 1133 浏览 1 评论 0原文

我面临着将数据聚合到日常数据的问题。我有一个数据框，其中 NA 已被删除（下面给出了数据图片的链接）。每天收集3次数据，但有时由于NA的原因，每天只有1或2条记录；有时数据完全丢失。

我现在有兴趣计算“dist”的每日平均值：这意味着将一天的“dist”数据相加，然后除以每天的条目数（如果没有，则为 3）当天数据丢失）。我想通过循环来做到这一点。我怎样才能用循环来做到这一点？问题是，有时我每天有 3 个条目，有时只有 2 个甚至 1 个。我想告诉 R 每天，它应该总结“dist” > 并除以每天可用的条目数。

我只是不知道如何为此目的制定 for 循环。如果您能为我提供有关该问题的任何建议，我将不胜感激。感谢您的努力和亲切的问候，

Jan

数据框架：http://www .pic-upload.de/view-11435581/Data_loop.jpg.html

编辑：我按照建议使用了聚合和tapply，但是，数据的平均值并未真正计算：

              Group.1         x
1  2006-10-06 12:00:00  636.5395
2  2006-10-06 20:00:00  859.0109
3  2006-10-07 04:00:00  301.8548
4  2006-10-07 12:00:00  649.3357
5  2006-10-07 20:00:00  944.8272
6  2006-10-08 04:00:00  136.7393
7  2006-10-08 12:00:00  360.9560
8  2006-10-08 20:00:00       NaN

使用的代码是：

dates<-Dis_sub$date
distance<-Dis_sub$dist
aggregate(distance,list(dates),mean,na.rm=TRUE)
tapply(distance,dates,mean,na.rm=TRUE)

原文

I am facing a problem concerning aggregating my data to daily data.
I have a data frame where NAs have been removed (Link of picture of data is given below). Data has been collected 3 times a day, but sometimes due to NAs, there is just 1 or 2 entries per day; some days data is missing completely.

I am now interested in calculating the daily mean of "dist": this means summing up the data of "dist" of one day and dividing it by number of entries per day (so 3 if there is no data missing that day). I would like to do this via a loop.
How can I do this with a loop? The problem is that sometimes I have 3 entries per day and sometimes just 2 or even 1. I would like to tell R that for every day, it should sum up "dist" and divide it by the number of entries that are available for every day.

I just have no idea how to formulate a for loop for this purpose. I would really appreciate if you could give me any advice on that problem. Thanks for your efforts and kind regards,

Jan

Data frame: http://www.pic-upload.de/view-11435581/Data_loop.jpg.html

Edit: I used aggregate and tapply as suggested, however, the mean value of the data was not really calculated:

              Group.1         x
1  2006-10-06 12:00:00  636.5395
2  2006-10-06 20:00:00  859.0109
3  2006-10-07 04:00:00  301.8548
4  2006-10-07 12:00:00  649.3357
5  2006-10-07 20:00:00  944.8272
6  2006-10-08 04:00:00  136.7393
7  2006-10-08 12:00:00  360.9560
8  2006-10-08 20:00:00       NaN

The code used was:

dates<-Dis_sub$date
distance<-Dis_sub$dist
aggregate(distance,list(dates),mean,na.rm=TRUE)
tapply(distance,dates,mean,na.rm=TRUE)

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

二智少女 2024-12-12 05:40:28

不要使用循环。使用 R。一些示例数据 :

dates <- rep(seq(as.Date("2001-01-05"),
                 as.Date("2001-01-20"),
                 by="day"),
             each=3)
values <- rep(1:16,each=3)
values[c(4,5,6,10,14,15,30)] <- NA

和任何 :

aggregate(values,list(dates),mean,na.rm=TRUE)

tapply(values,dates,mean,na.rm=TRUE)

都会给你你想要的。另请参阅?aggregate 和?tapply。

如果您想要返回数据帧，可以查看包 plyr ：

Data <- as.data.frame(dates,values)
require(plyr)

ddply(data,"dates",mean,na.rm=TRUE)

请记住，ddply 尚未完全支持日期格式。

Don't use a loop. Use R. Some example data :

dates <- rep(seq(as.Date("2001-01-05"),
                 as.Date("2001-01-20"),
                 by="day"),
             each=3)
values <- rep(1:16,each=3)
values[c(4,5,6,10,14,15,30)] <- NA

and any of :

aggregate(values,list(dates),mean,na.rm=TRUE)

tapply(values,dates,mean,na.rm=TRUE)

gives you what you want. See also ?aggregate and ?tapply.

If you want a dataframe back, you can look at the package plyr :

Data <- as.data.frame(dates,values)
require(plyr)

ddply(data,"dates",mean,na.rm=TRUE)

Keep in mind that ddply is not fully supporting the date format (yet).

回复收藏 0 原文

吹梦到西洲 2024-12-12 05:40:28

查看 data.table 包，尤其是当您的数据很大时。下面是一些按天计算dist平均值的代码。

library(data.table)
dt = data.table(Data)
Data[,list(avg_dist = mean(dist, na.rm = T)),'date']

Look at the data.table package especially if your data is huge. Here is some code that calculates the mean of dist by day.

library(data.table)
dt = data.table(Data)
Data[,list(avg_dist = mean(dist, na.rm = T)),'date']

回复收藏 0 原文

魔法唧唧 2024-12-12 05:40:28

看来您的主要问题是您的 date 字段附加了时间。您需要做的第一件事是创建一个仅包含日期的列，

Dis_sub$date_only <- as.Date(Dis_sub$date)

然后使用 Joris Meys 的解决方案（这是正确的方法）应该可以工作。

但是，如果由于某种原因您确实想要使用循环，您可以尝试类似的方法

newFrame <- data.frame()
for d in unique(Dis_sub$date){
    meanDist <- mean(Dis_sub$dist[Dis_sub$date==d],na.rm=TRUE)
    newFrame <- rbind(newFrame,c(d,meanDist))
}

，但请记住，这会很慢并且内存效率低下。

It looks like your main problem is that your date field has times attached. The first thing you need to do is create a column that has just the date using something like

Dis_sub$date_only <- as.Date(Dis_sub$date)

Then using Joris Meys' solution (which is the right way to do it) should work.

However if for some reason you really want to use a loop you could try something like

newFrame <- data.frame()
for d in unique(Dis_sub$date){
    meanDist <- mean(Dis_sub$dist[Dis_sub$date==d],na.rm=TRUE)
    newFrame <- rbind(newFrame,c(d,meanDist))
}

But keep in mind that this will be slow and memory-inefficient.

回复收藏 0 原文

~没有更多了~