如何将百分位数(/分位数)值添加到数据框中的列
我的数据集包含从 2009 年到 2021 年每天河流的流量测量值。它分为季节:冬季(十二月、一月、二月)、春季(三月、四月、五月)、夏季(六月、四月、五月)。七月、八月)和秋季(九月、十月、十一月)。
这是我的数据集的示例:
> (chitt_brook_wylye_2)
# A tibble: 4,437 x 7
river year season month date flow_rate quality
<chr> <dbl> <chr> <chr> <dttm> <dbl> <chr>
1 chittern_brook 2009 Winter December 2009-12-01 00:00:00 0.059 Good
2 chittern_brook 2009 Winter December 2009-12-02 00:00:00 0.061 Good
3 chittern_brook 2009 Winter December 2009-12-03 00:00:00 0.064 Good
4 chittern_brook 2009 Winter December 2009-12-04 00:00:00 0.068 Good
5 chittern_brook 2009 Winter December 2009-12-05 00:00:00 0.076 Good
6 chittern_brook 2009 Winter December 2009-12-06 00:00:00 0.138 Good
7 chittern_brook 2009 Winter December 2009-12-07 00:00:00 0.592 Good
8 chittern_brook 2009 Winter December 2009-12-08 00:00:00 1.04 Good
9 chittern_brook 2009 Winter December 2009-12-09 00:00:00 1.46 Good
10 chittern_brook 2009 Winter December 2009-12-10 00:00:00 1.7 Good
# ... with 4,427 more rows
我想找到每年每个季节的第 95 个百分位数、第 5 个百分位数、中位数和平均值,并分别获取第 95 个百分位数、中位数和平均值新数据框中的列。
例如:
> (df)
# A tibble: 49 x 2
season_label flow_rate_mean Q95 Q5 flow_rate_median
<chr> <dbl>
1 Winter 2009 0.453 3 2 4
2 Spring 2010 0.519 6 3 4
3 Summer 2010 0.0627 4 3 6
4 Autumn 2010 0.0415 6 2 6
5 Winter 2010 0.0622 8 3 3
6 Spring 2011 0.188 10 3 2
7 Summer 2011 0.0499 2 3 2
8 Autumn 2011 0.0383 2 2 1
9 Winter 2011 0.0461 5 2 7
10 Spring 2012 0.0925 3 2 8
# ... with 39 more rows
我目前有这个代码,它仅使用前两列创建上述数据框,但我希望它还包括第 95 个百分位数、第 5 个百分位数和中位数。这是可行的还是我需要单独完成然后将其合并到一个数据帧中?
df <- chitt_brook_wylye_2 %>%
dplyr::mutate(month = as.numeric(format(date,"%m")),
year = as.numeric(format(date,"%Y")),
season_id = (12*year + month) %/% 3) %>%
dplyr::group_by(season_id) %>%
dplyr::mutate(season_label = paste(season, min(year))) %>%
dplyr::group_by(season_id,season_label) %>%
dplyr::summarise(flow_rate = mean(flow_rate))
可重现的示例和代码:
date <- as.Date(c("2009-12-01","2010-01-01","2010-02-01","2010-03-01","2010-04-01","2010-05-01","2010-06-01","2010-07-01","2010-08-01","2010-09-01","2010-10-01","2010-11-01","2010-12-01"))
season <- c("Winter","Winter","Winter","Spring","Spring","Spring","Summer","Summer","Summer","Autumn","Autumn","Autumn","Winter")
var <- c(1,2,3,5,5,5,7,7,7,9,9,9,10)
df <- data.frame(date,season,var) %>% # creating the dataframe
dplyr::mutate(month = as.numeric(format(date,"%m")),
year = as.numeric(format(date,"%Y")),
season_id = (12*year + month) %/% 3) %>% #generating an identifiant for every season that exists in the data
dplyr::group_by(season_id) %>% # Grouping by the id
dplyr::mutate(season_label = paste(min(year),season)) %>%
dplyr::group_by(season_id,season_label) %>% ## season_label to keep the newly created label after the arriving summarise
dplyr::summarise(var = mean(var)) # Computing the mean
My data set has flow rate measurements of a river for every day of the year from 2009 to 2021. This is split up into seasons: Winter (December, Jan, Feb), Spring (March, April, May), Summer (June, July, August) and Autumn (September, October, November).
This is a sample of my data set:
> (chitt_brook_wylye_2)
# A tibble: 4,437 x 7
river year season month date flow_rate quality
<chr> <dbl> <chr> <chr> <dttm> <dbl> <chr>
1 chittern_brook 2009 Winter December 2009-12-01 00:00:00 0.059 Good
2 chittern_brook 2009 Winter December 2009-12-02 00:00:00 0.061 Good
3 chittern_brook 2009 Winter December 2009-12-03 00:00:00 0.064 Good
4 chittern_brook 2009 Winter December 2009-12-04 00:00:00 0.068 Good
5 chittern_brook 2009 Winter December 2009-12-05 00:00:00 0.076 Good
6 chittern_brook 2009 Winter December 2009-12-06 00:00:00 0.138 Good
7 chittern_brook 2009 Winter December 2009-12-07 00:00:00 0.592 Good
8 chittern_brook 2009 Winter December 2009-12-08 00:00:00 1.04 Good
9 chittern_brook 2009 Winter December 2009-12-09 00:00:00 1.46 Good
10 chittern_brook 2009 Winter December 2009-12-10 00:00:00 1.7 Good
# ... with 4,427 more rows
I want to find the 95th percentile, 5th percentile, median and the mean of each season of every year and have the values for 95th 5th, median and mean in separate columns in a new dataframe.
For example:
> (df)
# A tibble: 49 x 2
season_label flow_rate_mean Q95 Q5 flow_rate_median
<chr> <dbl>
1 Winter 2009 0.453 3 2 4
2 Spring 2010 0.519 6 3 4
3 Summer 2010 0.0627 4 3 6
4 Autumn 2010 0.0415 6 2 6
5 Winter 2010 0.0622 8 3 3
6 Spring 2011 0.188 10 3 2
7 Summer 2011 0.0499 2 3 2
8 Autumn 2011 0.0383 2 2 1
9 Winter 2011 0.0461 5 2 7
10 Spring 2012 0.0925 3 2 8
# ... with 39 more rows
I currently have this code which creates the above dataframe with just the first two columns but I would like it to also include 95th percentile, 5th percentile and median. Is this feasible or will I need to do it separately and then combine it into one dataframe?
df <- chitt_brook_wylye_2 %>%
dplyr::mutate(month = as.numeric(format(date,"%m")),
year = as.numeric(format(date,"%Y")),
season_id = (12*year + month) %/% 3) %>%
dplyr::group_by(season_id) %>%
dplyr::mutate(season_label = paste(season, min(year))) %>%
dplyr::group_by(season_id,season_label) %>%
dplyr::summarise(flow_rate = mean(flow_rate))
Reproducible example and code:
date <- as.Date(c("2009-12-01","2010-01-01","2010-02-01","2010-03-01","2010-04-01","2010-05-01","2010-06-01","2010-07-01","2010-08-01","2010-09-01","2010-10-01","2010-11-01","2010-12-01"))
season <- c("Winter","Winter","Winter","Spring","Spring","Spring","Summer","Summer","Summer","Autumn","Autumn","Autumn","Winter")
var <- c(1,2,3,5,5,5,7,7,7,9,9,9,10)
df <- data.frame(date,season,var) %>% # creating the dataframe
dplyr::mutate(month = as.numeric(format(date,"%m")),
year = as.numeric(format(date,"%Y")),
season_id = (12*year + month) %/% 3) %>% #generating an identifiant for every season that exists in the data
dplyr::group_by(season_id) %>% # Grouping by the id
dplyr::mutate(season_label = paste(min(year),season)) %>%
dplyr::group_by(season_id,season_label) %>% ## season_label to keep the newly created label after the arriving summarise
dplyr::summarise(var = mean(var)) # Computing the mean
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论