根据组中的第一值的条件,在DF中替换后续值
我在有序的R数据框架中具有这种类型的数据。
set.seed(25)
date <- sort(as.Date(sample( as.numeric(as.Date("2019-01-01")): as.numeric(as.Date("2021-03-31")), 10,
replace = T),
origin = '1970-01-01'))
type <- c("Football", "Football", "Rugby", "Football", "Hockey", "Tennis", "Hockey", "Basketball", "Basketball", "Rugby")
id <- c("1","1","1","1","2","2","3","4","4","5")
df <- data.frame(date,id, type)
date id type
2019-04-09 1 Football
2019-04-13 1 Football
2019-04-20 1 Rugby
2019-04-21 1 Football
2019-05-31 2 Hockey
2020-02-09 2 Tennis
2020-03-08 3 Hockey
2020-03-24 4 Basketball
2020-08-18 4 Football
2020-11-01 5 Rugby
我要得到的结果是:
date id type type_2
2019-04-09 1 Football Football
2019-04-13 1 Football Football
2019-04-20 1 Rugby Multi
2019-04-21 1 Football Multi
2019-05-31 2 Hockey Hockey
2020-02-09 2 Tennis Multi
2020-03-08 3 Hockey Hockey
2020-03-24 4 Basketball Basketball
2020-08-18 4 Basketball Basketball
2020-11-01 5 Rugby Rugby
基本上,如果他的下一项运动与上一项相同的运动,那么ID练习就会保持不变,type_2保持不变,但是稍后他更改运动一方面,他以后更改了其余的价值观。
我尝试使用lag()
,lead()
和if_else()
在dplyr
中执行此操作,但是结果永远不会出来我想要的方式。
I have this type of data in an ordered R dataframe.
set.seed(25)
date <- sort(as.Date(sample( as.numeric(as.Date("2019-01-01")): as.numeric(as.Date("2021-03-31")), 10,
replace = T),
origin = '1970-01-01'))
type <- c("Football", "Football", "Rugby", "Football", "Hockey", "Tennis", "Hockey", "Basketball", "Basketball", "Rugby")
id <- c("1","1","1","1","2","2","3","4","4","5")
df <- data.frame(date,id, type)
date id type
2019-04-09 1 Football
2019-04-13 1 Football
2019-04-20 1 Rugby
2019-04-21 1 Football
2019-05-31 2 Hockey
2020-02-09 2 Tennis
2020-03-08 3 Hockey
2020-03-24 4 Basketball
2020-08-18 4 Football
2020-11-01 5 Rugby
The result I'm trying to get at is this:
date id type type_2
2019-04-09 1 Football Football
2019-04-13 1 Football Football
2019-04-20 1 Rugby Multi
2019-04-21 1 Football Multi
2019-05-31 2 Hockey Hockey
2020-02-09 2 Tennis Multi
2020-03-08 3 Hockey Hockey
2020-03-24 4 Basketball Basketball
2020-08-18 4 Basketball Basketball
2020-11-01 5 Rugby Rugby
Basically, the first sport in time an id practices stays if the next sport he practices is the same as the previous one, type_2 remains the same, but as soon as he changes sport later on, he changes to multi for the rest of his values later on.
I tried do this with lag()
, lead()
and if_else()
in dplyr
but the results never come out the way I want.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
您可以从
data.table
中使用rleid
来生成每个type
变量的运行长度ID id 。第一个更改后的所有内容变为“ Multi”
。如果您希望将其写入
dplyr
-You may use
rleid
fromdata.table
to generate the running length id fortype
variable in eachid
. Everything after the first change becomes"Multi"
.If you prefer to write it in
dplyr
-