行是在多个列中以序列为顺序的

发布于 2025-01-24 09:26:01 字数 2019 浏览 0 评论 0原文

从2017年到2019年和2022年,我有每月的州失业率数据。我想获得每个州的年平均失业率。

是否有一种方法可以使用序列或循环从第2列开始获得每12列的平均值?然后,在2022年,这只会是三列(1月至3月)的平均值,

我目前有以下功能,但这非常低效。特别是当我开始使用更大的数据集时。

编辑

# Edit: Below shows unemployment rates for year 2017 (not showing 2018-2022)

> df[2:13]
   Jan.2017 Feb.2017 Mar.2017 Apr.2017 May.2017 Jun.2017 Jul.2017 Aug.2017 Sep.2017 Oct.2017 Nov.2017 Dec.2017
1       5.5      5.2      5.0      4.8      4.6      4.4      4.3      4.2      4.1      4.0      4.0      4.0
2       6.6      6.6      6.5      6.5      6.5      6.5      6.5      6.5      6.5      6.5      6.5      6.4
3       5.2      5.2      5.1      5.0      5.0      4.9      4.9      4.8      4.9      4.9      4.9      4.9
4       3.8      3.7      3.7      3.7      3.7      3.7      3.7      3.7      3.8      3.8      3.8      3.8

# I am using the below to get the average from Jan2017-Dec2017, Jan 2018-Dec 2018 etc.

df$x2017 <- rowMeans(df[ , c(2:13)], na.rm=TRUE)
df$x2018 <- rowMeans(df[ , c(14:25)], na.rm=TRUE)
df$x2019 <- rowMeans(df[ , c(26:37)], na.rm=TRUE)
df$x2021 <- rowMeans(df[ , c(38:49)], na.rm=TRUE)
df$x2022 <- rowMeans(df[ , c(50:52)], na.rm=TRUE)

# output
  State x2017 x2018 x2019 x2021 x2022
1               Alabama   8.0   7.2   6.6   6.1   5.9
2                Alaska   7.2   7.0   6.6   6.3   6.5
3               Arizona   8.3   7.7   6.7   6.0   5.6
4              Arkansas   7.2   6.9   5.7   4.7   4.0

我只是在寻找可以降低错误的可能性的东西,以便在哪些列获得平均值。

如果需要

state = c("Alabama", "Alaska", "Arizona")
Jan2017 = c(1:3)
Feb2017 = c(4:6)
Jan2018 = c(7:9)
Feb2018 = c(10:12)
Jan2019 = c(13:15)
Feb2019 = c(16:18)
df3=data.frame(state,Jan2017,Feb2017,Jan2018,Feb2018,Jan2019,Feb2019)

> df3
    state Jan2017 Feb2017 Jan2018 Feb2018 Jan2019 Feb2019
1 Alabama       1       4       7      10      13      16
2  Alaska       2       5       8      11      14      17
3 Arizona       3       6       9      12      15      18

I have monthly data for the state unemployment rates from 2017 to 2019, and 2022. I want to get the yearly average unemployment rate for each state.

Is there a way to use sequences or for loops to get the mean for every 12th column starting from column 2? Then for 2022, it would only be the mean across three columns (January to March)

I currently have the below but this is terribly inefficient. Especially when I start working with much larger data sets.

EDIT

# Edit: Below shows unemployment rates for year 2017 (not showing 2018-2022)

> df[2:13]
   Jan.2017 Feb.2017 Mar.2017 Apr.2017 May.2017 Jun.2017 Jul.2017 Aug.2017 Sep.2017 Oct.2017 Nov.2017 Dec.2017
1       5.5      5.2      5.0      4.8      4.6      4.4      4.3      4.2      4.1      4.0      4.0      4.0
2       6.6      6.6      6.5      6.5      6.5      6.5      6.5      6.5      6.5      6.5      6.5      6.4
3       5.2      5.2      5.1      5.0      5.0      4.9      4.9      4.8      4.9      4.9      4.9      4.9
4       3.8      3.7      3.7      3.7      3.7      3.7      3.7      3.7      3.8      3.8      3.8      3.8

# I am using the below to get the average from Jan2017-Dec2017, Jan 2018-Dec 2018 etc.

df$x2017 <- rowMeans(df[ , c(2:13)], na.rm=TRUE)
df$x2018 <- rowMeans(df[ , c(14:25)], na.rm=TRUE)
df$x2019 <- rowMeans(df[ , c(26:37)], na.rm=TRUE)
df$x2021 <- rowMeans(df[ , c(38:49)], na.rm=TRUE)
df$x2022 <- rowMeans(df[ , c(50:52)], na.rm=TRUE)

# output
  State x2017 x2018 x2019 x2021 x2022
1               Alabama   8.0   7.2   6.6   6.1   5.9
2                Alaska   7.2   7.0   6.6   6.3   6.5
3               Arizona   8.3   7.7   6.7   6.0   5.6
4              Arkansas   7.2   6.9   5.7   4.7   4.0

I’m just looking for something that reduces potential for errors when telling it which columns to get the averages for.

made up df if needed

state = c("Alabama", "Alaska", "Arizona")
Jan2017 = c(1:3)
Feb2017 = c(4:6)
Jan2018 = c(7:9)
Feb2018 = c(10:12)
Jan2019 = c(13:15)
Feb2019 = c(16:18)
df3=data.frame(state,Jan2017,Feb2017,Jan2018,Feb2018,Jan2019,Feb2019)

> df3
    state Jan2017 Feb2017 Jan2018 Feb2018 Jan2019 Feb2019
1 Alabama       1       4       7      10      13      16
2  Alaska       2       5       8      11      14      17
3 Arizona       3       6       9      12      15      18

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

只是在用心讲痛 2025-01-31 09:26:01
cbind(df3[1], sapply(split.default(df3[-1], sub("\\D+", "", names(df3)[-1])), rowMeans))

    state 2017 2018 2019
1 Alabama  2.5  8.5 14.5
2  Alaska  3.5  9.5 15.5
3 Arizona  4.5 10.5 16.5
cbind(df3[1], sapply(split.default(df3[-1], sub("\\D+", "", names(df3)[-1])), rowMeans))

    state 2017 2018 2019
1 Alabama  2.5  8.5 14.5
2  Alaska  3.5  9.5 15.5
3 Arizona  4.5 10.5 16.5
计㈡愣 2025-01-31 09:26:01

这是一个tidyverse透视和总结的解决方案:

library(dplyr)
library(tidyr)

df3 %>% 
  pivot_longer(-state) %>% 
  mutate(helper = parse_number(name)) %>% 
  group_by(state, helper) %>% 
  mutate(mean = mean(value, na.rm=TRUE)) %>% 
  pivot_wider(names_from = helper,
              values_from = mean) %>% 
  group_by(state) %>% 
  summarise(across(-c(name, value), mean, na.rm = TRUE), .groups = 'drop')

       state   `2017` `2018` `2019`
  <chr>    <dbl>  <dbl>  <dbl>
1 Alabama    2.5    8.5   14.5
2 Alaska     3.5    9.5   15.5
3 Arizona    4.5   10.5   16.5

Here is a tidyverse solution with pivoting and summarizing:

library(dplyr)
library(tidyr)

df3 %>% 
  pivot_longer(-state) %>% 
  mutate(helper = parse_number(name)) %>% 
  group_by(state, helper) %>% 
  mutate(mean = mean(value, na.rm=TRUE)) %>% 
  pivot_wider(names_from = helper,
              values_from = mean) %>% 
  group_by(state) %>% 
  summarise(across(-c(name, value), mean, na.rm = TRUE), .groups = 'drop')

       state   `2017` `2018` `2019`
  <chr>    <dbl>  <dbl>  <dbl>
1 Alabama    2.5    8.5   14.5
2 Alaska     3.5    9.5   15.5
3 Arizona    4.5   10.5   16.5
拥抱没勇气 2025-01-31 09:26:01

这是另一个略有不同的tidyverse选项:

library(tidyverse)

df3 %>%
  pivot_longer(
    cols = -state,
    names_to = c(NA, ".value"),
    names_pattern = "(.*)(\\d{4})"
  ) %>%
  group_by(state) %>% 
  summarize(across(everything(), mean, na.rm = TRUE))

输出

  state   `2017` `2018` `2019`
  <chr>    <dbl>  <dbl>  <dbl>
1 Alabama    2.5    8.5   14.5
2 Alaska     3.5    9.5   15.5
3 Arizona    4.5   10.5   16.5

Here is another slightly different tidyverse option:

library(tidyverse)

df3 %>%
  pivot_longer(
    cols = -state,
    names_to = c(NA, ".value"),
    names_pattern = "(.*)(\\d{4})"
  ) %>%
  group_by(state) %>% 
  summarize(across(everything(), mean, na.rm = TRUE))

Output

  state   `2017` `2018` `2019`
  <chr>    <dbl>  <dbl>  <dbl>
1 Alabama    2.5    8.5   14.5
2 Alaska     3.5    9.5   15.5
3 Arizona    4.5   10.5   16.5
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文