根据两个分组条件计算中位日期
我有以下数据框架:
> head(df)
# A tibble: 6 x 6
# Groups: lat, decade [2]
lat long date year decade month_day
<dbl> <dbl> <date> <chr> <chr> <chr>
1 55 18 1952-02-03 1952 1950-1959 02-03
2 55 18 1958-02-08 1958 1950-1959 02-08
3 55 18 1958-02-08 1958 1950-1959 02-08
4 55 18 1958-02-08 1958 1950-1959 02-08
5 55 18 1965-02-07 1965 1960-1969 02-07
6 55 18 1966-03-03 1966 1960-1969 03-03
> summary(df)
lat long date year decade
Min. :55.00 Min. :18 Min. :1951-03-22 Length:1414 Length:1414
1st Qu.:56.00 1st Qu.:18 1st Qu.:1987-01-01 Class :character Class :character
Median :58.00 Median :18 Median :2004-04-02 Mode :character Mode :character
Mean :59.07 Mean :18 Mean :1999-02-16
3rd Qu.:62.00 3rd Qu.:18 3rd Qu.:2014-01-01
Max. :68.00 Max. :18 Max. :2021-03-28
month_day
Length:1414
Class :character
Mode :character
我想按照纬度程度(lat
)和 per per
十年< /code>
我已经尝试过,但无法遇到错误:
df = df %>%
group_by(lat, decade) %>%
summarise(across(month_day, median)) %>%
ungroup
Error in `summarise()`:
! Problem while computing `..1 = across(month_day, median)`.
Caused by error:
! `month_day` must return compatible vectors across groups.
i Result type for group 1 (lat = 55, decade = "1950-1959"): <double>.
i Result type for group 2 (lat = 55, decade = "1960-1969"): <character>.
我不知道如何解决它,非常感谢您的帮助。
编辑:
> ds_filtered_median[ds_filtered_median$lat == '57', ]
# A tibble: 124 x 6
lat long date year decade month_day
<dbl> <dbl> <date> <chr> <chr> <chr>
1 57 18 1955-04-08 1955 1950-1959 04-08
2 57 18 1957-02-19 1957 1950-1959 02-19
3 57 18 1958-04-06 1958 1950-1959 04-06
4 57 18 1959-01-01 1959 1950-1959 01-01
5 57 18 1960-01-03 1960 1960-1969 01-03
6 57 18 1961-01-02 1961 1960-1969 01-02
7 57 18 1962-01-02 1962 1960-1969 01-02
8 57 18 1963-01-01 1963 1960-1969 01-01
9 57 18 1964-01-19 1964 1960-1969 01-19
10 57 18 1965-01-12 1965 1960-1969 01-12
# ... with 114 more rows
I have the following data frame:
> head(df)
# A tibble: 6 x 6
# Groups: lat, decade [2]
lat long date year decade month_day
<dbl> <dbl> <date> <chr> <chr> <chr>
1 55 18 1952-02-03 1952 1950-1959 02-03
2 55 18 1958-02-08 1958 1950-1959 02-08
3 55 18 1958-02-08 1958 1950-1959 02-08
4 55 18 1958-02-08 1958 1950-1959 02-08
5 55 18 1965-02-07 1965 1960-1969 02-07
6 55 18 1966-03-03 1966 1960-1969 03-03
> summary(df)
lat long date year decade
Min. :55.00 Min. :18 Min. :1951-03-22 Length:1414 Length:1414
1st Qu.:56.00 1st Qu.:18 1st Qu.:1987-01-01 Class :character Class :character
Median :58.00 Median :18 Median :2004-04-02 Mode :character Mode :character
Mean :59.07 Mean :18 Mean :1999-02-16
3rd Qu.:62.00 3rd Qu.:18 3rd Qu.:2014-01-01
Max. :68.00 Max. :18 Max. :2021-03-28
month_day
Length:1414
Class :character
Mode :character
I would like to get the median month_day
by degree of latitude (lat
) and per decade
I have tried this but cannot get past an error:
df = df %>%
group_by(lat, decade) %>%
summarise(across(month_day, median)) %>%
ungroup
Error in `summarise()`:
! Problem while computing `..1 = across(month_day, median)`.
Caused by error:
! `month_day` must return compatible vectors across groups.
i Result type for group 1 (lat = 55, decade = "1950-1959"): <double>.
i Result type for group 2 (lat = 55, decade = "1960-1969"): <character>.
I do not know how to solve it, thank you very much for your help.
EDIT:
> ds_filtered_median[ds_filtered_median$lat == '57', ]
# A tibble: 124 x 6
lat long date year decade month_day
<dbl> <dbl> <date> <chr> <chr> <chr>
1 57 18 1955-04-08 1955 1950-1959 04-08
2 57 18 1957-02-19 1957 1950-1959 02-19
3 57 18 1958-04-06 1958 1950-1959 04-06
4 57 18 1959-01-01 1959 1950-1959 01-01
5 57 18 1960-01-03 1960 1960-1969 01-03
6 57 18 1961-01-02 1961 1960-1969 01-02
7 57 18 1962-01-02 1962 1960-1969 01-02
8 57 18 1963-01-01 1963 1960-1969 01-01
9 57 18 1964-01-19 1964 1960-1969 01-19
10 57 18 1965-01-12 1965 1960-1969 01-12
# ... with 114 more rows
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
您可以做的是将日期转换为一年开始以来的天数。从该数字中,您可以轻松计算中位数。然后,以参考为参考。但是,您可以在LEAP年中之一...为了进行约会操作,我使用了润滑剂。
What you can do is convert your date to days since the start of a year. From that number you can easily calculate your median. Then convert your days back with any first of january as a reference. You can me one of on leap years though... For date manipulation I used lubridate.
您必须将
month_day
转换为数字以获取中位数。遍历
仅在为多列分别计算出某些内容以获得中间LON
和LAT
使用data%&gt;%总结时,才需要
。 (跨(any_of(c(“ lat”,“ long”),中间)))在2022-04-05上创建的 reprex软件包(v2.0.0)
You must convert
month_day
to numeric to get the median.across
is only needed if something is calculated for multiple columns individually e.g. to get medianlon
andlat
usingdata %>% summarise(across(any_of(c("lat", "long")), median))
Created on 2022-04-05 by the reprex package (v2.0.0)