在组合数据框中处理重复项

发布于 2025-02-03 08:36:22 字数 1334 浏览 2 评论 0原文

首先，大喊大叫，非常感谢大家帮助回答我的问题。你们很棒。

在与R进行编码时，我会再次需要您的帮助。情况出现了两个数据范围，其中dataFrame1 One描述了葡萄牙类，而DataFrame2描述了数学类。我确实想找到副本（有些学生，因为一个学生上课），而不是删除他，而是通过指示他在两个课程中扩展“ class”列，例如“ Math+Portuguese”。

我试图通过创建两个新的数据来简化我的数据框架（实际上它们更大，但最终方法应该是SAM）。有一个重复（父母都是医生的学生）。我只想在数据范围内让他有一次，其中“ Math+Portuguese”列“类”。

为了识别重复项，必须忽略“等级”列。

非常感谢您的帮助。一切顺利，亚历山大

# Creation of Dataset 1 (Portuguese students)
school <- c(rep("S1",7),rep("S2",3))
Age <- c(18,18,19,19,20,20,21,21,22,22)
professionf <- c(rep("teacher",9),rep("doctor",1))            
professionm <- c(rep("police",9),rep("doctor",1))
Class <- rep("Portuguese",10)
Grade <- round(runif(10,1,5),0)
DataframeP <- cbind(school, Age, professionf,professionm,Grade,Class)
View(DataframeP)

#Creation of Dataset 2 (Math students)
school <- c(rep("S1",7),rep("S2",3))
Age <- c(18,18,19,19,20,20,21,21,22,22)
professionf <- c(rep("lawyer",9),rep("doctor",1))            
professionm <- c(rep("police",9),rep("doctor",1))
Class <- rep("Math",10)
Grade <- round(runif(10,1,5),0)
DataframeM <- cbind(school, Age, professionf,professionm,Grade,Class)
View(DataframeM)

#Combination of the two Dataframes, where the identification of the dupicates should take place
DF_All <- rbind(DataframeM,DataframeP)
View(DF_All)

原文

first of all a big shout-out and big thank you to all in helping to answer my questions. You guys are amazing.

I would need your help once again in Coding with R.
The situation arises with two Dataframes, where Dataframe1 one describes a Portuguese class and Dataframe2 describes a Math class. I do want to find the duplicate (as there are some, as one student takes both classes) and not delete him, but expand the column "Class" by indicating, he is on both classes, something like "Math+Portuguese".

I tried to simplify my Dataframes (in reality they are much bigger, but the final approach should be the sam) by creating two new ones. There is one duplicate (the student where both parents are doctors). I just want to have him one time in the Dataframe, with the wording "Math+Portuguese" in the column "Class".

For the identification of the duplicates, the column "Grades" has to be ignored.

Thank you very much for you help.
All the best,
Alexander

# Creation of Dataset 1 (Portuguese students)
school <- c(rep("S1",7),rep("S2",3))
Age <- c(18,18,19,19,20,20,21,21,22,22)
professionf <- c(rep("teacher",9),rep("doctor",1))            
professionm <- c(rep("police",9),rep("doctor",1))
Class <- rep("Portuguese",10)
Grade <- round(runif(10,1,5),0)
DataframeP <- cbind(school, Age, professionf,professionm,Grade,Class)
View(DataframeP)

#Creation of Dataset 2 (Math students)
school <- c(rep("S1",7),rep("S2",3))
Age <- c(18,18,19,19,20,20,21,21,22,22)
professionf <- c(rep("lawyer",9),rep("doctor",1))            
professionm <- c(rep("police",9),rep("doctor",1))
Class <- rep("Math",10)
Grade <- round(runif(10,1,5),0)
DataframeM <- cbind(school, Age, professionf,professionm,Grade,Class)
View(DataframeM)

#Combination of the two Dataframes, where the identification of the dupicates should take place
DF_All <- rbind(DataframeM,DataframeP)
View(DF_All)

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

明明#如月 2025-02-10 08:36:22

亲爱的亚历山大，应该这样做！

library(data.table)
require(dplyr)
df_merged <- merge(x = DataframeP, y = DataframeM, by = c("school", "Age", "professionf",  "penter code hererofessionm"), all = TRUE)
df_merged <- within(df_merged, Class.x[Class.x == 'Portuguese' & Class.y == 'Math'] <- 'Portoguese + Math')
df_merged$Class.x = coalesce(df_merged$Class.x, df_merged$Class.y)
df_merged$Grade.x = coalesce(df_merged$Grade.x, df_merged$Grade.y)
df_merged <- df_merged[1:(length(df_merged)-2)]
setnames(df_merged, old = c('Grade.x','Class.x'), new = c('Grade','Class'))
df_merged

That should do it, dear Alexander!

library(data.table)
require(dplyr)
df_merged <- merge(x = DataframeP, y = DataframeM, by = c("school", "Age", "professionf",  "penter code hererofessionm"), all = TRUE)
df_merged <- within(df_merged, Class.x[Class.x == 'Portuguese' & Class.y == 'Math'] <- 'Portoguese + Math')
df_merged$Class.x = coalesce(df_merged$Class.x, df_merged$Class.y)
df_merged$Grade.x = coalesce(df_merged$Grade.x, df_merged$Grade.y)
df_merged <- df_merged[1:(length(df_merged)-2)]
setnames(df_merged, old = c('Grade.x','Class.x'), new = c('Grade','Class'))
df_merged

回复收藏 0 原文

~没有更多了~