在 R 中合并聚合数据

发布于 2024-10-27 03:29:16 字数 1198 浏览 4 评论 0原文

继我的上一个问题关于将每小时数据聚合成每日数据之后,我想继续 (a) 每月汇总和 (b) 将每月汇总合并到原始数据框中。

我的原始数据框如下所示:

Lines <- "Date,Outdoor,Indoor
01/01/2000 01:00,30,25
01/01/2000 02:00,31,26
01/01/2000 03:00,33,24
02/01/2000 01:00,29,25
02/01/2000 02:00,27,26
02/01/2000 03:00,39,24
12/01/2000 02:00,27,26
12/01/2000 03:00,39,24
12/31/2000 23:00,28,25"

每日汇总已在我的上一个问题,然后我可以找到从那里生成每月聚合的方法,如下所示:

Lines <- "Date,Month,OutdoorAVE
01/01/2000,Jan,31.33
02/01/2000,Feb,31.67
12/01/2000,Dec,31.33"

其中 OutdoorAVE 是每日最低和最高室外温度的月平均值。我最终想要的是这样的:

Lines <- "Date,Outdoor,Indoor,Month,OutdoorAVE
01/01/2000 01:00,30,25,Jan,31.33
01/01/2000 02:00,31,26,Jan,31.33
01/01/2000 03:00,33,24,Jan,31.33
02/01/2000 01:00,29,25,Feb,31.67
02/01/2000 02:00,27,26,Feb,31.67
02/01/2000 03:00,39,24,Feb,31.67
12/01/2000 02:00,27,26,Dec,31.33
12/01/2000 03:00,39,24,Dec,31.33
12/31/2000 23:00,28,25,Dec,31.33"

我不知道足够的 R 如何做到这一点。非常感谢任何帮助。

Following up my previous question about aggregating hourly data into daily data, I want to continue with (a) monthly aggregate and (b) merging the monthly aggregate into the original dataframe.

My original dataframe looks like this:

Lines <- "Date,Outdoor,Indoor
01/01/2000 01:00,30,25
01/01/2000 02:00,31,26
01/01/2000 03:00,33,24
02/01/2000 01:00,29,25
02/01/2000 02:00,27,26
02/01/2000 03:00,39,24
12/01/2000 02:00,27,26
12/01/2000 03:00,39,24
12/31/2000 23:00,28,25"

The daily aggregates have been answered in my previous question, and then I can find my way to produce the monthly aggregates from there, to something like this:

Lines <- "Date,Month,OutdoorAVE
01/01/2000,Jan,31.33
02/01/2000,Feb,31.67
12/01/2000,Dec,31.33"

Where the OutdoorAVE is the monthly average of the daily minimum and maximum outdoor temperature. What I want to have in the end is something like this:

Lines <- "Date,Outdoor,Indoor,Month,OutdoorAVE
01/01/2000 01:00,30,25,Jan,31.33
01/01/2000 02:00,31,26,Jan,31.33
01/01/2000 03:00,33,24,Jan,31.33
02/01/2000 01:00,29,25,Feb,31.67
02/01/2000 02:00,27,26,Feb,31.67
02/01/2000 03:00,39,24,Feb,31.67
12/01/2000 02:00,27,26,Dec,31.33
12/01/2000 03:00,39,24,Dec,31.33
12/31/2000 23:00,28,25,Dec,31.33"

I do not know enough R on how to do that. Any help is greatly appreciated.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

爱已欠费 2024-11-03 03:29:16

尝试 ave 和例如 POSIXlt 提取月份:

zz <- textConnection(Lines)
Data <- read.table(zz,header=T,sep=",",stringsAsFactors=F)
close(zz)

Data$Month <- strftime(
     as.POSIXlt(Data$Date,format="%m/%d/%Y %H:%M"),
     format='%b')
Data$outdoor_ave <- ave(Data$Outdoor,Data$Month,FUN=mean)

给出:

> Data
              Date Outdoor Indoor Month outdoor_ave
1 01/01/2000 01:00      30     25   Jan    31.33333
2 01/01/2000 02:00      31     26   Jan    31.33333
3 01/01/2000 03:00      33     24   Jan    31.33333
4 02/01/2000 01:00      29     25   Feb    31.66667
5 02/01/2000 02:00      27     26   Feb    31.66667
6 02/01/2000 03:00      39     24   Feb    31.66667
7 12/01/2000 02:00      27     26   Dec    31.33333
8 12/01/2000 03:00      39     24   Dec    31.33333
9 12/31/2000 23:00      28     25   Dec    31.33333

编辑:然后只需计算数据中的月份,如上所示并使用合并:

zz <- textConnection(Lines2) # Lines2 is the aggregated data
Data2 <- read.table(zz,header=T,sep=",",stringsAsFactors=F)
close(zz)

> merge(Data,Data2[-1],all=T)
  Month             Date Outdoor Indoor OutdoorAVE
1   Dec 12/01/2000 02:00      27     26      31.33
2   Dec 12/01/2000 03:00      39     24      31.33
3   Dec 12/31/2000 23:00      28     25      31.33
4   Feb 02/01/2000 01:00      29     25      31.67
5   Feb 02/01/2000 02:00      27     26      31.67
6   Feb 02/01/2000 03:00      39     24      31.67
7   Jan 01/01/2000 01:00      30     25      31.33
8   Jan 01/01/2000 02:00      31     26      31.33
9   Jan 01/01/2000 03:00      33     24      31.33

Try ave and eg POSIXlt to extract the month:

zz <- textConnection(Lines)
Data <- read.table(zz,header=T,sep=",",stringsAsFactors=F)
close(zz)

Data$Month <- strftime(
     as.POSIXlt(Data$Date,format="%m/%d/%Y %H:%M"),
     format='%b')
Data$outdoor_ave <- ave(Data$Outdoor,Data$Month,FUN=mean)

Gives :

> Data
              Date Outdoor Indoor Month outdoor_ave
1 01/01/2000 01:00      30     25   Jan    31.33333
2 01/01/2000 02:00      31     26   Jan    31.33333
3 01/01/2000 03:00      33     24   Jan    31.33333
4 02/01/2000 01:00      29     25   Feb    31.66667
5 02/01/2000 02:00      27     26   Feb    31.66667
6 02/01/2000 03:00      39     24   Feb    31.66667
7 12/01/2000 02:00      27     26   Dec    31.33333
8 12/01/2000 03:00      39     24   Dec    31.33333
9 12/31/2000 23:00      28     25   Dec    31.33333

Edit : Then just calcualte Month in Data as shown above and use merge :

zz <- textConnection(Lines2) # Lines2 is the aggregated data
Data2 <- read.table(zz,header=T,sep=",",stringsAsFactors=F)
close(zz)

> merge(Data,Data2[-1],all=T)
  Month             Date Outdoor Indoor OutdoorAVE
1   Dec 12/01/2000 02:00      27     26      31.33
2   Dec 12/01/2000 03:00      39     24      31.33
3   Dec 12/31/2000 23:00      28     25      31.33
4   Feb 02/01/2000 01:00      29     25      31.67
5   Feb 02/01/2000 02:00      27     26      31.67
6   Feb 02/01/2000 03:00      39     24      31.67
7   Jan 01/01/2000 01:00      30     25      31.33
8   Jan 01/01/2000 02:00      31     26      31.33
9   Jan 01/01/2000 03:00      33     24      31.33
浮华 2024-11-03 03:29:16

这与您的问题无关,但您可能希望使用 RSQLite 和单独的表来存储各种聚合值,并使用简单的 SQL 命令连接表。如果您使用多种聚合,您的数据框很容易变得又大又难看。

This is tangential to your question, but you may want to use RSQLite and a separate tables for various aggregate values instead, and join the tables with simple SQL commands. If you use many kinds of aggregations your data frame can easily get large and ugly.

梨涡少年 2024-11-03 03:29:16

这是 Zoo/xts 解决方案。请注意,Month 在这里是数字,因为您不能在 Zoo/xts 对象中混合类型。

require(xts) # loads zoo too
Lines1 <- "Date,Outdoor,Indoor
01/01/2000 01:00,30,25
01/01/2000 02:00,31,26
01/01/2000 03:00,33,24
02/01/2000 01:00,29,25
02/01/2000 02:00,27,26
02/01/2000 03:00,39,24
12/01/2000 02:00,27,26
12/01/2000 03:00,39,24
12/31/2000 23:00,28,25"
con <- textConnection(Lines1)
z <- read.zoo(con, header=TRUE, sep=",",
    format="%m/%d/%Y %H:%M", FUN=as.POSIXct)
close(con)

zz <- merge(z, Month=.indexmon(z),
    OutdoorAVE=ave(z[,1], .indexmon(z), FUN=mean))
zz
#                     Outdoor Indoor Month OutdoorAVE
# 2000-01-01 01:00:00      30     25     0   31.33333
# 2000-01-01 02:00:00      31     26     0   31.33333
# 2000-01-01 03:00:00      33     24     0   31.33333
# 2000-02-01 01:00:00      29     25     1   31.66667
# 2000-02-01 02:00:00      27     26     1   31.66667
# 2000-02-01 03:00:00      39     24     1   31.66667
# 2000-12-01 02:00:00      27     26    11   31.33333
# 2000-12-01 03:00:00      39     24    11   31.33333
# 2000-12-31 23:00:00      28     25    11   31.33333

更新:如何使用两个不同的数据集获得上述结果。

Lines2 <- "Date,Month,OutdoorAVE
01/01/2000,Jan,31.33
02/01/2000,Feb,31.67
12/01/2000,Dec,31.33"
con <- textConnection(Lines2)
z2 <- read.zoo(con, header=TRUE, sep=",", format="%m/%d/%Y",
    FUN=as.POSIXct, colClasses=c("character","NULL","numeric"))
close(con)

zz2 <- na.locf(merge(z1, Month=.indexmon(z1), OutdoorAVE=z2))[index(z1)]
# same output as zz (above)

Here's a zoo/xts solution. Note that Month is numeric here because you can't mix types in zoo/xts objects.

require(xts) # loads zoo too
Lines1 <- "Date,Outdoor,Indoor
01/01/2000 01:00,30,25
01/01/2000 02:00,31,26
01/01/2000 03:00,33,24
02/01/2000 01:00,29,25
02/01/2000 02:00,27,26
02/01/2000 03:00,39,24
12/01/2000 02:00,27,26
12/01/2000 03:00,39,24
12/31/2000 23:00,28,25"
con <- textConnection(Lines1)
z <- read.zoo(con, header=TRUE, sep=",",
    format="%m/%d/%Y %H:%M", FUN=as.POSIXct)
close(con)

zz <- merge(z, Month=.indexmon(z),
    OutdoorAVE=ave(z[,1], .indexmon(z), FUN=mean))
zz
#                     Outdoor Indoor Month OutdoorAVE
# 2000-01-01 01:00:00      30     25     0   31.33333
# 2000-01-01 02:00:00      31     26     0   31.33333
# 2000-01-01 03:00:00      33     24     0   31.33333
# 2000-02-01 01:00:00      29     25     1   31.66667
# 2000-02-01 02:00:00      27     26     1   31.66667
# 2000-02-01 03:00:00      39     24     1   31.66667
# 2000-12-01 02:00:00      27     26    11   31.33333
# 2000-12-01 03:00:00      39     24    11   31.33333
# 2000-12-31 23:00:00      28     25    11   31.33333

Update: How do get the above result using two different data sets.

Lines2 <- "Date,Month,OutdoorAVE
01/01/2000,Jan,31.33
02/01/2000,Feb,31.67
12/01/2000,Dec,31.33"
con <- textConnection(Lines2)
z2 <- read.zoo(con, header=TRUE, sep=",", format="%m/%d/%Y",
    FUN=as.POSIXct, colClasses=c("character","NULL","numeric"))
close(con)

zz2 <- na.locf(merge(z1, Month=.indexmon(z1), OutdoorAVE=z2))[index(z1)]
# same output as zz (above)
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文