在 R 中按组组合行和列上的字符变量
我是 R 的初学者,我正在尝试解决 R 中的问题,我想这对于有经验的用户来说相当容易。
问题如下:客户(A、B、C)重复使用不同的程序(Prg)。我想确定程序的“典型序列”。因此,我确定了第一个程序,它们消耗了第二个,以及第三个。下一步,我想将这些信息合并到客户的程序序列中。对于先消费Prg1、然后Prg2、然后Prg3的客户,最终结果应该是“Prg1-Prg2-Prg3”。
下面的代码生成一个与我的数据框类似的数据框。 Prg 是相应年份的计划,First 是客户进入的第一年,Sec 是第二年,Third 是第三年。
该代码生成提取第一个合同 (Code_1_Prg)、第二个合同 (Code_2_Prg) 和第三个合同 (Code_3_Prg) 中使用的程序的列。
不幸的是,我没有成功地将这三列组合到所需的目标。我尝试按 ID 进行分组并将序列的第一个元素保存在名为“chain1”的新列中。这里我收到错误消息“Error in df %>% group_by(ID) %>% df$chain1 = df[df$Code_1_Prg != "NA",: 即使我使用 magrittr 和 dplyr 软件包,也找不到函数“%>%<-”。
detach(package:plyr)
library(dplyr)
library(magrittr)
df %>%
group_by(ID) %>%
df$chain1 = df[df$Code_1_Prg!="NA", "Code_1_Prg"]
下面,我分享一些代码,它生成数据帧以及按组提取 Code_1_Prg 中的字符变量的起点。
如果你能帮我解决这个问题,我将非常感激。预先非常感谢您!
df <- data.frame("ID"=c("A","A","A","A","B", "B", "B","B","B","C","C", "C", "C","C","C","C"),
"Year_Contract" =c("2010", "2015", "2017","2017","2010","2010", "2015","2015","2020","2015","2015","2017","2017","2017","2018","2018"),
"Prg"=c("AIB","AIB","LLA","LLA","BBU","BBU", "KLU","KLU","DDI","CKN","CKN","BBU","BBU","BBU","KLU","KLU"),
"First"=c("2010","2010","2010","2010","2010","2010", "2010","2010","2010","2015","2015","2015","2015","2015","2015","2015"),
"Sec"=c("2015","2015","2015","2015","2015","2015", "2015","2015","2015","2017","2017","2017","2017","2017","2017","2017"),
"Third"=c("2017","2017","2017","2017","2020","2020", "2020","2020","2020","2018","2018","2018","2018","2018","2018","2018")
)
df$Code_1_Prg <- ifelse(df$Year_Contract == df$First, df$Code_1_Prg <- df$Prg, NA)
df$Code_2_Prg <- ifelse(df$Year_Contract == df$Sec, df$Code_2_Prg <- df$Prg, NA)
df$Code_3_Prg <- ifelse(df$Year_Contract == df$Third, df$Code_3_Prg <- df$Prg, NA)
detach(package:plyr)
library(dplyr)
library(magrittr)
df %>%
group_by(ID) %>%
df$chain1 = df[df$Code_1_Prg!="NA", "Code_1_Prg"]
#This is the final column, I am trying to create
df2 <- data.frame("ID"=c("A","B", "C"),
"Goal" =c("AIB-LLA", "BBU-KLU-DDI", "CKN-BBU-KLU")
)
df <- merge(df, df2, by="ID")
I am a beginner in R and I am trying to solve a problem in R, which is I guess quite easy for experienced users.
The problem is the following: Customers (A, B, C) are coming in repeatedly using different programms (Prg). I would like to identify "typical sequences" of programs. Therefore, I identify the first programm, they consume, the second, and the third. In a next step, I would like to combine these information to sequences of programms by customer. For a customer first consuming Prg1, then Prg2, then Prg3, the final outcome should be "Prg1-Prg2-Prg3".
The code below produces a dataframe similar to the one I have. Prg is the Programm in the respective year, First is the first year the customer enters, Sec the second and Third the third.
The code produces columns that extract the program consumed in the first contract (Code_1_Prg), second contract (Code_2_Prg) and third contract (Code_3_Prg).
Unfortunately, I am not successful combining these 3 columns to the required goal. I tried to group by ID and save the frist element of the sequence in a new column called "chain1". Here I get the error message "Error in df %>% group_by(ID) %>% df$chain1 = df[df$Code_1_Prg != "NA", :
could not find function "%>%<-", even though I am using the magrittr and dplyr packages.
detach(package:plyr)
library(dplyr)
library(magrittr)
df %>%
group_by(ID) %>%
df$chain1 = df[df$Code_1_Prg!="NA", "Code_1_Prg"]
Below, I share some code, which produces the dataframe and the starting point for extracting the character variable in Code_1_Prg by group.
I would be really grateful, if you could help me with this. Thank you very much in advance!
df <- data.frame("ID"=c("A","A","A","A","B", "B", "B","B","B","C","C", "C", "C","C","C","C"),
"Year_Contract" =c("2010", "2015", "2017","2017","2010","2010", "2015","2015","2020","2015","2015","2017","2017","2017","2018","2018"),
"Prg"=c("AIB","AIB","LLA","LLA","BBU","BBU", "KLU","KLU","DDI","CKN","CKN","BBU","BBU","BBU","KLU","KLU"),
"First"=c("2010","2010","2010","2010","2010","2010", "2010","2010","2010","2015","2015","2015","2015","2015","2015","2015"),
"Sec"=c("2015","2015","2015","2015","2015","2015", "2015","2015","2015","2017","2017","2017","2017","2017","2017","2017"),
"Third"=c("2017","2017","2017","2017","2020","2020", "2020","2020","2020","2018","2018","2018","2018","2018","2018","2018")
)
df$Code_1_Prg <- ifelse(df$Year_Contract == df$First, df$Code_1_Prg <- df$Prg, NA)
df$Code_2_Prg <- ifelse(df$Year_Contract == df$Sec, df$Code_2_Prg <- df$Prg, NA)
df$Code_3_Prg <- ifelse(df$Year_Contract == df$Third, df$Code_3_Prg <- df$Prg, NA)
detach(package:plyr)
library(dplyr)
library(magrittr)
df %>%
group_by(ID) %>%
df$chain1 = df[df$Code_1_Prg!="NA", "Code_1_Prg"]
#This is the final column, I am trying to create
df2 <- data.frame("ID"=c("A","B", "C"),
"Goal" =c("AIB-LLA", "BBU-KLU-DDI", "CKN-BBU-KLU")
)
df <- merge(df, df2, by="ID")
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
您在寻找这样的东西吗?
Are you looking for something like this?