获取达到所需金额所需的天数

发布于 2025-01-17 05:51:25 字数 1724 浏览 6 评论 0原文

我目前有一个数据框，其中包含站点名称、降雨日期、降雨量（随附示例），我有兴趣探索每个站点达到特定降雨量所需的天数（和/或月数）。例如：

是否有可能基于像示例这样的数据集获得像上面这样的输出？我最初的思考过程是单独过滤每个站点，加入一个日历数据框，从该范围中提取最小值和最大值，计算它们之间的天数，并使用 case_when 对它们进行分类。这种方法似乎有点复杂，希望得到任何有关更好方法的指导。

感谢您的建议！

示例数据集：

Example <- structure(list(Name.Station = c("Station A", "Station A", "Station A", 
                                        "Station A", "Station A", "Station B", "Station B", "Station B", 
                                        "Station C", "Station C", "Station C", "Station C"), Rainfall.Date = c("7/10/2020", 
                                                                                                               "8/12/2020", "8/01/2021", "25/06/2021", "26/10/2021", "7/01/2020", 
                                                                                                               "22/01/2020", "5/02/2020", "5/09/2020", "5/10/2020", "5/11/2020", 
                                                                                                               "5/12/2020"), Rainfall.Amount = c(210, 210, 208.47, 208.16, 203.67, 
                                                                                                                                                 227.49, 225, 222.54, 250, 250, 246.18, 245.15)), class = "data.frame", row.names = c(NA, 
                                                                                                                                                                                                                                      -12L))

原文

I currently have a data frame which contains Name of Station, Date of Rainfall, Amount of Rainfall (Example attached) I am interested in exploring the number of days (and/or months) it takes each Station to reach a particular amount of rainfall.
For example:

Is it possible to obtain an output like the one above based on a dataset like the example one?
My initial thought process is to filter each station individually, join to a calendar dataframe which extracts the min and max from that range, count the days between them and use a case_when to categorize them. This approach seems a bit convoluted and would appreciate any guidance as to what would be a better approach.

Thanks for the suggestions!

Example Dataset:

Example <- structure(list(Name.Station = c("Station A", "Station A", "Station A", 
                                        "Station A", "Station A", "Station B", "Station B", "Station B", 
                                        "Station C", "Station C", "Station C", "Station C"), Rainfall.Date = c("7/10/2020", 
                                                                                                               "8/12/2020", "8/01/2021", "25/06/2021", "26/10/2021", "7/01/2020", 
                                                                                                               "22/01/2020", "5/02/2020", "5/09/2020", "5/10/2020", "5/11/2020", 
                                                                                                               "5/12/2020"), Rainfall.Amount = c(210, 210, 208.47, 208.16, 203.67, 
                                                                                                                                                 227.49, 225, 222.54, 250, 250, 246.18, 245.15)), class = "data.frame", row.names = c(NA, 
                                                                                                                                                                                                                                      -12L))

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

夜吻♂芭芘 2025-01-24 05:51:25

通过站，您可以计算出大于阈值的降雨量cumsum（以毫米为单位）。然后计算从开始日期到累计和中最大值的日期的序列天数的长度。

不过，首先，您的日期格式应该正确。

Example <- transform(Example, Rainfall.Date=as.Date(Rainfall.Date, '%d/%m/%Y'))

do.call(rbind, by(Example, Example$Name.Station, \(x) {
  f <- \(mm, x.=x) {
    mx <- which.max(cumsum(x.$Rainfall.Amount) > mm)
    length(do.call(seq.Date, c(as.list(range(x.$Rainfall.Date[1:mx])), 1)))
  }
  ds <- seq.int(200, 1e3, 200)  ## sequence of 200, 400, ... , 1000mm
  r <- t(vapply(ds, f, 0))
  data.frame(Name.Station=el(x$Name.Station), `colnames<-`(r, paste0('d_', ds)))
}))
#           Name.Station d_200 d_400 d_600 d_800 d_1000
# Station A    Station A     1    63    94   262    385
# Station B    Station B     1    16    30     1      1
# Station C    Station C     1    31    62    92      1

注意：使用 R >= 4.1。

by station you could calculate the cumsum of rainfall greater than the threshold in mm. Then calculate the length of the sequence of days from start date to the date which is maximum in the cumsum.

First of all, though, your dates should be formatted properly.

Example <- transform(Example, Rainfall.Date=as.Date(Rainfall.Date, '%d/%m/%Y'))

do.call(rbind, by(Example, Example$Name.Station, \(x) {
  f <- \(mm, x.=x) {
    mx <- which.max(cumsum(x.$Rainfall.Amount) > mm)
    length(do.call(seq.Date, c(as.list(range(x.$Rainfall.Date[1:mx])), 1)))
  }
  ds <- seq.int(200, 1e3, 200)  ## sequence of 200, 400, ... , 1000mm
  r <- t(vapply(ds, f, 0))
  data.frame(Name.Station=el(x$Name.Station), `colnames<-`(r, paste0('d_', ds)))
}))
#           Name.Station d_200 d_400 d_600 d_800 d_1000
# Station A    Station A     1    63    94   262    385
# Station B    Station B     1    16    30     1      1
# Station C    Station C     1    31    62    92      1

Note: R >= 4.1 used.

回复收藏 0 原文

揽清风入怀 2025-01-24 05:51:25

这是一个 tidyverse 方法：

library(dplyr)
library(tidyr)

Example %>%
  group_by(Name.Station) %>%
  mutate(Rainfall.Date = as.Date(Rainfall.Date, "%d/%m/%Y"),
         days = cumsum(c(1, diff(Rainfall.Date))),
         crainfall = cumsum(Rainfall.Amount),
         fi = (findInterval(crainfall, seq(0, 1000, 200)) -1) * 200) %>%
  pivot_wider(id_cols = Name.Station, names_from = fi, values_from = days, names_glue = {"days_to_{fi}_mm"}, values_fn = min)

# A tibble: 3 x 6
# Groups:   Name.Station [3]
  Name.Station days_to_200_mm days_to_400_mm days_to_600_mm days_to_800_mm days_to_1000_mm
  <chr>                 <dbl>          <dbl>          <dbl>          <dbl>           <dbl>
1 Station A                 1             63             94            262             385
2 Station B                 1             16             30             NA              NA
3 Station C                 1             31             62             92              NA

Here is a tidyverse approach:

library(dplyr)
library(tidyr)

Example %>%
  group_by(Name.Station) %>%
  mutate(Rainfall.Date = as.Date(Rainfall.Date, "%d/%m/%Y"),
         days = cumsum(c(1, diff(Rainfall.Date))),
         crainfall = cumsum(Rainfall.Amount),
         fi = (findInterval(crainfall, seq(0, 1000, 200)) -1) * 200) %>%
  pivot_wider(id_cols = Name.Station, names_from = fi, values_from = days, names_glue = {"days_to_{fi}_mm"}, values_fn = min)

# A tibble: 3 x 6
# Groups:   Name.Station [3]
  Name.Station days_to_200_mm days_to_400_mm days_to_600_mm days_to_800_mm days_to_1000_mm
  <chr>                 <dbl>          <dbl>          <dbl>          <dbl>           <dbl>
1 Station A                 1             63             94            262             385
2 Station B                 1             16             30             NA              NA
3 Station C                 1             31             62             92              NA

回复收藏 0 原文

~没有更多了~