在组合数据框中处理重复项
首先,大喊大叫,非常感谢大家帮助回答我的问题。你们很棒。
在与R进行编码时,我会再次需要您的帮助。 情况出现了两个数据范围,其中dataFrame1 One描述了葡萄牙类,而DataFrame2描述了数学类。我确实想找到副本(有些学生,因为一个学生上课),而不是删除他,而是通过指示他在两个课程中扩展“ class”列,例如“ Math+Portuguese”。
我试图通过创建两个新的数据来简化我的数据框架(实际上它们更大,但最终方法应该是SAM)。有一个重复(父母都是医生的学生)。我只想在数据范围内让他有一次,其中“ Math+Portuguese”列“类”。
为了识别重复项,必须忽略“等级”列。
非常感谢您的帮助。 一切顺利, 亚历山大
# Creation of Dataset 1 (Portuguese students)
school <- c(rep("S1",7),rep("S2",3))
Age <- c(18,18,19,19,20,20,21,21,22,22)
professionf <- c(rep("teacher",9),rep("doctor",1))
professionm <- c(rep("police",9),rep("doctor",1))
Class <- rep("Portuguese",10)
Grade <- round(runif(10,1,5),0)
DataframeP <- cbind(school, Age, professionf,professionm,Grade,Class)
View(DataframeP)
#Creation of Dataset 2 (Math students)
school <- c(rep("S1",7),rep("S2",3))
Age <- c(18,18,19,19,20,20,21,21,22,22)
professionf <- c(rep("lawyer",9),rep("doctor",1))
professionm <- c(rep("police",9),rep("doctor",1))
Class <- rep("Math",10)
Grade <- round(runif(10,1,5),0)
DataframeM <- cbind(school, Age, professionf,professionm,Grade,Class)
View(DataframeM)
#Combination of the two Dataframes, where the identification of the dupicates should take place
DF_All <- rbind(DataframeM,DataframeP)
View(DF_All)
first of all a big shout-out and big thank you to all in helping to answer my questions. You guys are amazing.
I would need your help once again in Coding with R.
The situation arises with two Dataframes, where Dataframe1 one describes a Portuguese class and Dataframe2 describes a Math class. I do want to find the duplicate (as there are some, as one student takes both classes) and not delete him, but expand the column "Class" by indicating, he is on both classes, something like "Math+Portuguese".
I tried to simplify my Dataframes (in reality they are much bigger, but the final approach should be the sam) by creating two new ones. There is one duplicate (the student where both parents are doctors). I just want to have him one time in the Dataframe, with the wording "Math+Portuguese" in the column "Class".
For the identification of the duplicates, the column "Grades" has to be ignored.
Thank you very much for you help.
All the best,
Alexander
# Creation of Dataset 1 (Portuguese students)
school <- c(rep("S1",7),rep("S2",3))
Age <- c(18,18,19,19,20,20,21,21,22,22)
professionf <- c(rep("teacher",9),rep("doctor",1))
professionm <- c(rep("police",9),rep("doctor",1))
Class <- rep("Portuguese",10)
Grade <- round(runif(10,1,5),0)
DataframeP <- cbind(school, Age, professionf,professionm,Grade,Class)
View(DataframeP)
#Creation of Dataset 2 (Math students)
school <- c(rep("S1",7),rep("S2",3))
Age <- c(18,18,19,19,20,20,21,21,22,22)
professionf <- c(rep("lawyer",9),rep("doctor",1))
professionm <- c(rep("police",9),rep("doctor",1))
Class <- rep("Math",10)
Grade <- round(runif(10,1,5),0)
DataframeM <- cbind(school, Age, professionf,professionm,Grade,Class)
View(DataframeM)
#Combination of the two Dataframes, where the identification of the dupicates should take place
DF_All <- rbind(DataframeM,DataframeP)
View(DF_All)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
亲爱的亚历山大,应该这样做!
That should do it, dear Alexander!