行是在多个列中以序列为顺序的
从2017年到2019年和2022年,我有每月的州失业率数据。我想获得每个州的年平均失业率。
是否有一种方法可以使用序列或循环从第2列开始获得每12列的平均值?然后,在2022年,这只会是三列(1月至3月)的平均值,
我目前有以下功能,但这非常低效。特别是当我开始使用更大的数据集时。
编辑
# Edit: Below shows unemployment rates for year 2017 (not showing 2018-2022)
> df[2:13]
Jan.2017 Feb.2017 Mar.2017 Apr.2017 May.2017 Jun.2017 Jul.2017 Aug.2017 Sep.2017 Oct.2017 Nov.2017 Dec.2017
1 5.5 5.2 5.0 4.8 4.6 4.4 4.3 4.2 4.1 4.0 4.0 4.0
2 6.6 6.6 6.5 6.5 6.5 6.5 6.5 6.5 6.5 6.5 6.5 6.4
3 5.2 5.2 5.1 5.0 5.0 4.9 4.9 4.8 4.9 4.9 4.9 4.9
4 3.8 3.7 3.7 3.7 3.7 3.7 3.7 3.7 3.8 3.8 3.8 3.8
# I am using the below to get the average from Jan2017-Dec2017, Jan 2018-Dec 2018 etc.
df$x2017 <- rowMeans(df[ , c(2:13)], na.rm=TRUE)
df$x2018 <- rowMeans(df[ , c(14:25)], na.rm=TRUE)
df$x2019 <- rowMeans(df[ , c(26:37)], na.rm=TRUE)
df$x2021 <- rowMeans(df[ , c(38:49)], na.rm=TRUE)
df$x2022 <- rowMeans(df[ , c(50:52)], na.rm=TRUE)
# output
State x2017 x2018 x2019 x2021 x2022
1 Alabama 8.0 7.2 6.6 6.1 5.9
2 Alaska 7.2 7.0 6.6 6.3 6.5
3 Arizona 8.3 7.7 6.7 6.0 5.6
4 Arkansas 7.2 6.9 5.7 4.7 4.0
我只是在寻找可以降低错误的可能性的东西,以便在哪些列获得平均值。
如果需要
state = c("Alabama", "Alaska", "Arizona")
Jan2017 = c(1:3)
Feb2017 = c(4:6)
Jan2018 = c(7:9)
Feb2018 = c(10:12)
Jan2019 = c(13:15)
Feb2019 = c(16:18)
df3=data.frame(state,Jan2017,Feb2017,Jan2018,Feb2018,Jan2019,Feb2019)
> df3
state Jan2017 Feb2017 Jan2018 Feb2018 Jan2019 Feb2019
1 Alabama 1 4 7 10 13 16
2 Alaska 2 5 8 11 14 17
3 Arizona 3 6 9 12 15 18
I have monthly data for the state unemployment rates from 2017 to 2019, and 2022. I want to get the yearly average unemployment rate for each state.
Is there a way to use sequences or for loops to get the mean for every 12th column starting from column 2? Then for 2022, it would only be the mean across three columns (January to March)
I currently have the below but this is terribly inefficient. Especially when I start working with much larger data sets.
EDIT
# Edit: Below shows unemployment rates for year 2017 (not showing 2018-2022)
> df[2:13]
Jan.2017 Feb.2017 Mar.2017 Apr.2017 May.2017 Jun.2017 Jul.2017 Aug.2017 Sep.2017 Oct.2017 Nov.2017 Dec.2017
1 5.5 5.2 5.0 4.8 4.6 4.4 4.3 4.2 4.1 4.0 4.0 4.0
2 6.6 6.6 6.5 6.5 6.5 6.5 6.5 6.5 6.5 6.5 6.5 6.4
3 5.2 5.2 5.1 5.0 5.0 4.9 4.9 4.8 4.9 4.9 4.9 4.9
4 3.8 3.7 3.7 3.7 3.7 3.7 3.7 3.7 3.8 3.8 3.8 3.8
# I am using the below to get the average from Jan2017-Dec2017, Jan 2018-Dec 2018 etc.
df$x2017 <- rowMeans(df[ , c(2:13)], na.rm=TRUE)
df$x2018 <- rowMeans(df[ , c(14:25)], na.rm=TRUE)
df$x2019 <- rowMeans(df[ , c(26:37)], na.rm=TRUE)
df$x2021 <- rowMeans(df[ , c(38:49)], na.rm=TRUE)
df$x2022 <- rowMeans(df[ , c(50:52)], na.rm=TRUE)
# output
State x2017 x2018 x2019 x2021 x2022
1 Alabama 8.0 7.2 6.6 6.1 5.9
2 Alaska 7.2 7.0 6.6 6.3 6.5
3 Arizona 8.3 7.7 6.7 6.0 5.6
4 Arkansas 7.2 6.9 5.7 4.7 4.0
I’m just looking for something that reduces potential for errors when telling it which columns to get the averages for.
made up df if needed
state = c("Alabama", "Alaska", "Arizona")
Jan2017 = c(1:3)
Feb2017 = c(4:6)
Jan2018 = c(7:9)
Feb2018 = c(10:12)
Jan2019 = c(13:15)
Feb2019 = c(16:18)
df3=data.frame(state,Jan2017,Feb2017,Jan2018,Feb2018,Jan2019,Feb2019)
> df3
state Jan2017 Feb2017 Jan2018 Feb2018 Jan2019 Feb2019
1 Alabama 1 4 7 10 13 16
2 Alaska 2 5 8 11 14 17
3 Arizona 3 6 9 12 15 18
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
这是一个
tidyverse
透视和总结的解决方案:Here is a
tidyverse
solution with pivoting and summarizing:这是另一个略有不同的
tidyverse
选项:输出
Here is another slightly different
tidyverse
option:Output