
发布于 2025-01-19 05:48:00 字数 2714 浏览 7 评论 0原文


> head(df)
# A tibble: 6 x 6
# Groups:   lat, decade [2]
    lat  long date       year  decade    month_day
  <dbl> <dbl> <date>     <chr> <chr>     <chr>    
1    55    18 1952-02-03 1952  1950-1959 02-03    
2    55    18 1958-02-08 1958  1950-1959 02-08    
3    55    18 1958-02-08 1958  1950-1959 02-08    
4    55    18 1958-02-08 1958  1950-1959 02-08    
5    55    18 1965-02-07 1965  1960-1969 02-07    
6    55    18 1966-03-03 1966  1960-1969 03-03    

> summary(df)
          lat             long         date                year              decade         
     Min.   :55.00   Min.   :18   Min.   :1951-03-22   Length:1414        Length:1414       
     1st Qu.:56.00   1st Qu.:18   1st Qu.:1987-01-01   Class :character   Class :character  
     Median :58.00   Median :18   Median :2004-04-02   Mode  :character   Mode  :character  
     Mean   :59.07   Mean   :18   Mean   :1999-02-16                                        
     3rd Qu.:62.00   3rd Qu.:18   3rd Qu.:2014-01-01                                        
     Max.   :68.00   Max.   :18   Max.   :2021-03-28                                        
     Class :character  
     Mode  :character 

我想按照纬度程度(lat per per 十年< /code>


df = df %>%
  group_by(lat, decade) %>%
  summarise(across(month_day, median)) %>%

Error in `summarise()`:
! Problem while computing `..1 = across(month_day, median)`.
Caused by error:
! `month_day` must return compatible vectors across groups.
i Result type for group 1 (lat = 55, decade = "1950-1959"): <double>.
i Result type for group 2 (lat = 55, decade = "1960-1969"): <character>.



> ds_filtered_median[ds_filtered_median$lat == '57', ]
# A tibble: 124 x 6
     lat  long date       year  decade    month_day
   <dbl> <dbl> <date>     <chr> <chr>     <chr>    
 1    57    18 1955-04-08 1955  1950-1959 04-08    
 2    57    18 1957-02-19 1957  1950-1959 02-19    
 3    57    18 1958-04-06 1958  1950-1959 04-06    
 4    57    18 1959-01-01 1959  1950-1959 01-01    
 5    57    18 1960-01-03 1960  1960-1969 01-03    
 6    57    18 1961-01-02 1961  1960-1969 01-02    
 7    57    18 1962-01-02 1962  1960-1969 01-02    
 8    57    18 1963-01-01 1963  1960-1969 01-01    
 9    57    18 1964-01-19 1964  1960-1969 01-19    
10    57    18 1965-01-12 1965  1960-1969 01-12    
# ... with 114 more rows

I have the following data frame:

> head(df)
# A tibble: 6 x 6
# Groups:   lat, decade [2]
    lat  long date       year  decade    month_day
  <dbl> <dbl> <date>     <chr> <chr>     <chr>    
1    55    18 1952-02-03 1952  1950-1959 02-03    
2    55    18 1958-02-08 1958  1950-1959 02-08    
3    55    18 1958-02-08 1958  1950-1959 02-08    
4    55    18 1958-02-08 1958  1950-1959 02-08    
5    55    18 1965-02-07 1965  1960-1969 02-07    
6    55    18 1966-03-03 1966  1960-1969 03-03    

> summary(df)
          lat             long         date                year              decade         
     Min.   :55.00   Min.   :18   Min.   :1951-03-22   Length:1414        Length:1414       
     1st Qu.:56.00   1st Qu.:18   1st Qu.:1987-01-01   Class :character   Class :character  
     Median :58.00   Median :18   Median :2004-04-02   Mode  :character   Mode  :character  
     Mean   :59.07   Mean   :18   Mean   :1999-02-16                                        
     3rd Qu.:62.00   3rd Qu.:18   3rd Qu.:2014-01-01                                        
     Max.   :68.00   Max.   :18   Max.   :2021-03-28                                        
     Class :character  
     Mode  :character 

I would like to get the median month_day by degree of latitude (lat) and per decade

I have tried this but cannot get past an error:

df = df %>%
  group_by(lat, decade) %>%
  summarise(across(month_day, median)) %>%

Error in `summarise()`:
! Problem while computing `..1 = across(month_day, median)`.
Caused by error:
! `month_day` must return compatible vectors across groups.
i Result type for group 1 (lat = 55, decade = "1950-1959"): <double>.
i Result type for group 2 (lat = 55, decade = "1960-1969"): <character>.

I do not know how to solve it, thank you very much for your help.


> ds_filtered_median[ds_filtered_median$lat == '57', ]
# A tibble: 124 x 6
     lat  long date       year  decade    month_day
   <dbl> <dbl> <date>     <chr> <chr>     <chr>    
 1    57    18 1955-04-08 1955  1950-1959 04-08    
 2    57    18 1957-02-19 1957  1950-1959 02-19    
 3    57    18 1958-04-06 1958  1950-1959 04-06    
 4    57    18 1959-01-01 1959  1950-1959 01-01    
 5    57    18 1960-01-03 1960  1960-1969 01-03    
 6    57    18 1961-01-02 1961  1960-1969 01-02    
 7    57    18 1962-01-02 1962  1960-1969 01-02    
 8    57    18 1963-01-01 1963  1960-1969 01-01    
 9    57    18 1964-01-19 1964  1960-1969 01-19    
10    57    18 1965-01-12 1965  1960-1969 01-12    
# ... with 114 more rows

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。



需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。


哀由 2025-01-26 05:48:00



data %>%
    date = ymd(date),
    days_since_january = as.numeric(date - ymd(paste(year(date), 1, 1, sep = "-")))
  ) %>%
  group_by(lat, decade) %>%
  summarise(across(days_since_january, median), .groups = "keep") %>%
  mutate(median_month_date = format(ymd("1960-01-01") + days(floor(days_since_january)), "%m-%d"))

# A tibble: 2 x 4
# Groups:   lat, decade [2]
    lat decade    days_since_january median_month_date
  <dbl> <chr>                  <dbl> <chr>            
1    55 1950-1959                 38 02-08            
2    55 1960-1969                 49 02-19  

# A tibble: 2 x 4
# Groups:   lat, decade [2]
    lat decade    days_since_january median_month_date
  <int> <chr>                  <dbl> <chr>            
1    57 1950-1959               72   03-13            
2    57 1960-1969                1.5 01-02   

What you can do is convert your date to days since the start of a year. From that number you can easily calculate your median. Then convert your days back with any first of january as a reference. You can me one of on leap years though... For date manipulation I used lubridate.


data %>%
    date = ymd(date),
    days_since_january = as.numeric(date - ymd(paste(year(date), 1, 1, sep = "-")))
  ) %>%
  group_by(lat, decade) %>%
  summarise(across(days_since_january, median), .groups = "keep") %>%
  mutate(median_month_date = format(ymd("1960-01-01") + days(floor(days_since_january)), "%m-%d"))

# A tibble: 2 x 4
# Groups:   lat, decade [2]
    lat decade    days_since_january median_month_date
  <dbl> <chr>                  <dbl> <chr>            
1    55 1950-1959                 38 02-08            
2    55 1960-1969                 49 02-19  

# A tibble: 2 x 4
# Groups:   lat, decade [2]
    lat decade    days_since_january median_month_date
  <int> <chr>                  <dbl> <chr>            
1    57 1950-1959               72   03-13            
2    57 1960-1969                1.5 01-02   
饮惑 2025-01-26 05:48:00

您必须将month_day转换为数字以获取中位数。 遍历仅在为多列分别计算出某些内容以获得中间LONLAT使用data%&gt;%总结时,才需要。 (跨(any_of(c(“ lat”,“ long”),中间)))


data <- tribble(
  ~lat, ~long, ~date, ~year, ~decade, ~month_day,
  55, 18, "1952-02-03", 1952, "1950-1959", "02-03",
  55, 18, "1958-02-08", 1958, "1950-1959", "02-08",
  55, 18, "1958-02-08", 1958, "1950-1959", "02-08",
  55, 18, "1958-02-08", 1958, "1950-1959", "02-08",
  55, 18, "1965-02-07", 1965, "1960-1969", "02-07",
  55, 18, "1966-03-03", 1966, "1960-1969", "03-03"

data %>%
    month_day_num = month_day %>% str_extract("[0-9]+$") %>% as.numeric()
  ) %>%
  group_by(lat, decade) %>%
    median_month_day = median(month_day_num)
#> `summarise()` has grouped output by 'lat'. You can override using the `.groups`
#> argument.
#> # A tibble: 2 × 3
#> # Groups:   lat [1]
#>     lat decade    median_month_day
#>   <dbl> <chr>                <dbl>
#> 1    55 1950-1959                8
#> 2    55 1960-1969                5

在2022-04-05上创建的 reprex软件包(v2.0.0)

You must convert month_day to numeric to get the median. across is only needed if something is calculated for multiple columns individually e.g. to get median lon and lat using data %>% summarise(across(any_of(c("lat", "long")), median))


data <- tribble(
  ~lat, ~long, ~date, ~year, ~decade, ~month_day,
  55, 18, "1952-02-03", 1952, "1950-1959", "02-03",
  55, 18, "1958-02-08", 1958, "1950-1959", "02-08",
  55, 18, "1958-02-08", 1958, "1950-1959", "02-08",
  55, 18, "1958-02-08", 1958, "1950-1959", "02-08",
  55, 18, "1965-02-07", 1965, "1960-1969", "02-07",
  55, 18, "1966-03-03", 1966, "1960-1969", "03-03"

data %>%
    month_day_num = month_day %>% str_extract("[0-9]+
quot;) %>% as.numeric()
  ) %>%
  group_by(lat, decade) %>%
    median_month_day = median(month_day_num)
#> `summarise()` has grouped output by 'lat'. You can override using the `.groups`
#> argument.
#> # A tibble: 2 × 3
#> # Groups:   lat [1]
#>     lat decade    median_month_day
#>   <dbl> <chr>                <dbl>
#> 1    55 1950-1959                8
#> 2    55 1960-1969                5

Created on 2022-04-05 by the reprex package (v2.0.0)

我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。