R 中的数据聚合循环

发布于 2024-12-05 05:40:28 字数 1133 浏览 1 评论 0原文

我面临着将数据聚合到日常数据的问题。 我有一个数据框,其中 NA 已被删除(下面给出了数据图片的链接)。每天收集3次数据,但有时由于NA的原因,每天只有1或2条记录;有时数据完全丢失。

我现在有兴趣计算“dist”的每日平均值:这意味着将一天的“dist”数据相加,然后除以每天的条目数(如果没有,则为 3)当天数据丢失)。我想通过循环来做到这一点。 我怎样才能用循环来做到这一点?问题是,有时我每天有 3 个条目,有时只有 2 个甚至 1 个。我想告诉 R 每天,它应该总结“dist” > 并除以每天可用的条目数

我只是不知道如何为此目的制定 for 循环。如果您能为我提供有关该问题的任何建议,我将不胜感激。感谢您的努力和亲切的问候,

Jan

数据框架:http://www .pic-upload.de/view-11435581/Data_loop.jpg.html

编辑:我按照建议使用了聚合和tapply,但是,数据的平均值并未真正计算:

              Group.1         x
1  2006-10-06 12:00:00  636.5395
2  2006-10-06 20:00:00  859.0109
3  2006-10-07 04:00:00  301.8548
4  2006-10-07 12:00:00  649.3357
5  2006-10-07 20:00:00  944.8272
6  2006-10-08 04:00:00  136.7393
7  2006-10-08 12:00:00  360.9560
8  2006-10-08 20:00:00       NaN

使用的代码是:

dates<-Dis_sub$date
distance<-Dis_sub$dist
aggregate(distance,list(dates),mean,na.rm=TRUE)
tapply(distance,dates,mean,na.rm=TRUE)

I am facing a problem concerning aggregating my data to daily data.
I have a data frame where NAs have been removed (Link of picture of data is given below). Data has been collected 3 times a day, but sometimes due to NAs, there is just 1 or 2 entries per day; some days data is missing completely.

I am now interested in calculating the daily mean of "dist": this means summing up the data of "dist" of one day and dividing it by number of entries per day (so 3 if there is no data missing that day). I would like to do this via a loop.
How can I do this with a loop? The problem is that sometimes I have 3 entries per day and sometimes just 2 or even 1. I would like to tell R that for every day, it should sum up "dist" and divide it by the number of entries that are available for every day.

I just have no idea how to formulate a for loop for this purpose. I would really appreciate if you could give me any advice on that problem. Thanks for your efforts and kind regards,

Jan

Data frame: http://www.pic-upload.de/view-11435581/Data_loop.jpg.html

Edit: I used aggregate and tapply as suggested, however, the mean value of the data was not really calculated:

              Group.1         x
1  2006-10-06 12:00:00  636.5395
2  2006-10-06 20:00:00  859.0109
3  2006-10-07 04:00:00  301.8548
4  2006-10-07 12:00:00  649.3357
5  2006-10-07 20:00:00  944.8272
6  2006-10-08 04:00:00  136.7393
7  2006-10-08 12:00:00  360.9560
8  2006-10-08 20:00:00       NaN

The code used was:

dates<-Dis_sub$date
distance<-Dis_sub$dist
aggregate(distance,list(dates),mean,na.rm=TRUE)
tapply(distance,dates,mean,na.rm=TRUE)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

二智少女 2024-12-12 05:40:28

不要使用循环。使用 R。一些示例数据 :

dates <- rep(seq(as.Date("2001-01-05"),
                 as.Date("2001-01-20"),
                 by="day"),
             each=3)
values <- rep(1:16,each=3)
values[c(4,5,6,10,14,15,30)] <- NA

和任何 :

aggregate(values,list(dates),mean,na.rm=TRUE)

tapply(values,dates,mean,na.rm=TRUE)

都会给你你想要的。另请参阅?aggregate?tapply

如果您想要返回数据帧,可以查看包 plyr

Data <- as.data.frame(dates,values)
require(plyr)

ddply(data,"dates",mean,na.rm=TRUE)

请记住,ddply 尚未完全支持日期格式。

Don't use a loop. Use R. Some example data :

dates <- rep(seq(as.Date("2001-01-05"),
                 as.Date("2001-01-20"),
                 by="day"),
             each=3)
values <- rep(1:16,each=3)
values[c(4,5,6,10,14,15,30)] <- NA

and any of :

aggregate(values,list(dates),mean,na.rm=TRUE)

tapply(values,dates,mean,na.rm=TRUE)

gives you what you want. See also ?aggregate and ?tapply.

If you want a dataframe back, you can look at the package plyr :

Data <- as.data.frame(dates,values)
require(plyr)

ddply(data,"dates",mean,na.rm=TRUE)

Keep in mind that ddply is not fully supporting the date format (yet).

吹梦到西洲 2024-12-12 05:40:28

查看 data.table 包,尤其是当您的数据很大时。下面是一些按计算dist平均值的代码。

library(data.table)
dt = data.table(Data)
Data[,list(avg_dist = mean(dist, na.rm = T)),'date']

Look at the data.table package especially if your data is huge. Here is some code that calculates the mean of dist by day.

library(data.table)
dt = data.table(Data)
Data[,list(avg_dist = mean(dist, na.rm = T)),'date']
魔法唧唧 2024-12-12 05:40:28

看来您的主要问题是您的 date 字段附加了时间。您需要做的第一件事是创建一个仅包含日期的列,

Dis_sub$date_only <- as.Date(Dis_sub$date)

然后使用 Joris Meys 的解决方案(这是正确的方法)应该可以工作。

但是,如果由于某种原因您确实想要使用循环,您可以尝试类似的方法

newFrame <- data.frame()
for d in unique(Dis_sub$date){
    meanDist <- mean(Dis_sub$dist[Dis_sub$date==d],na.rm=TRUE)
    newFrame <- rbind(newFrame,c(d,meanDist))
}

,但请记住,这会很慢并且内存效率低下。

It looks like your main problem is that your date field has times attached. The first thing you need to do is create a column that has just the date using something like

Dis_sub$date_only <- as.Date(Dis_sub$date)

Then using Joris Meys' solution (which is the right way to do it) should work.

However if for some reason you really want to use a loop you could try something like

newFrame <- data.frame()
for d in unique(Dis_sub$date){
    meanDist <- mean(Dis_sub$dist[Dis_sub$date==d],na.rm=TRUE)
    newFrame <- rbind(newFrame,c(d,meanDist))
}

But keep in mind that this will be slow and memory-inefficient.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文