如何组合2个数据框,创建仅在第二个数据框中出现的行,而不出现在第一范围内,而不是在第一个框架和组中以获取总和?
我想结合2个数据范围。我尝试了几种方法,但不确定如何实现最终数据框架。感谢有关我该怎么做的任何建议。
data_list_1 = [['Employee', 'Course Name', 'Status'],
['Abel', "Course_A", "Completed"],
['Bain', "Course_A", "Incomplete"]]
data_list_2 = [['Employee', 'Course Name', 'Lesson Name', 'Lesson Score', 'Status'],
['Abel', 'Course_B', 'Lesson_1', 100, ""],
['Abel', 'Course_B', 'Lesson_2', 100, ""],
['Abel', 'Course_B', 'Lesson_3', 100, ""],
['Abel', 'Course_B', 'Lesson_4', 100, ""],
['Bain', 'Course_B', 'Lesson_1', 100, ""],
['Bain', 'Course_B', 'Lesson_2', 100, ""],
['Coot', 'Course_B', 'Lesson_1', 100, ""],
['Coot', 'Course_B', 'Lesson_2', 100, ""],
['Coot', 'Course_B', 'Lesson_3', 100, ""],
['Coot', 'Course_B', 'Lesson_4', 100, ""],
['Coot', 'Course_B', 'Lesson_5', 100, ""]]
Course_A_df = pd.DataFrame(data_list_1[1:], columns = data_list_1[0])
Course_B_df = pd.DataFrame(data_list_2[1:], columns = data_list_2[0])
我想在Tableau中使用以下数据框架来可视化目的。基本上,最终的DF也应该有一个没有值的coot,如果所有5个课程得分均为100
to_achieved = [['Employee', 'Course Name', 'Lesson Name', 'Lesson Score', 'Status'],
['Abel', "Course_A", None, None, "Completed"],
['Bain', "Course_A", None, None, "Incomplete"],
['Coot', "Course_A", None, None, None],
['Abel', 'Course_B', 'Lesson_1', 100, "Incomplete"],
['Abel', 'Course_B', 'Lesson_2', 100, "Incomplete"],
['Abel', 'Course_B', 'Lesson_3', 100, "Incomplete"],
['Abel', 'Course_B', 'Lesson_4', 100, "Incomplete"],
['Bain', 'Course_B', 'Lesson_1', 100, "Incomplete"],
['Bain', 'Course_B', 'Lesson_2', 100, "Incomplete"],
['Coot', 'Course_B', 'Lesson_1', 100, "Completed"],
['Coot', 'Course_B', 'Lesson_2', 100, "Completed"],
['Coot', 'Course_B', 'Lesson_3', 100, "Completed"],
['Coot', 'Course_B', 'Lesson_4', 100, "Completed"],
['Coot', 'Course_B', 'Lesson_5', 100, "Completed"]]
to_achieved_df = pd.DataFrame(to_achieved[1:], columns = to_achieved[0])
to_achieved_df
。
df_concat = pd.concat([Course_A_df, Course_B_df], axis=0, ignore_index=True)
df_concat
merged = pd.merge(left=Course_A_df, right=Course_B_df, left_on='Employee', right_on='Employee', how='left')
merged
对于状态的计算,我尝试了Groupby,但是我可以检查该值是否为500并更新状态?
谢谢你!
I want to combine 2 dataframes. I have tried several methods but not sure how I can achieve the final dataframe. Appreciate any advice on how can i do this.
data_list_1 = [['Employee', 'Course Name', 'Status'],
['Abel', "Course_A", "Completed"],
['Bain', "Course_A", "Incomplete"]]
data_list_2 = [['Employee', 'Course Name', 'Lesson Name', 'Lesson Score', 'Status'],
['Abel', 'Course_B', 'Lesson_1', 100, ""],
['Abel', 'Course_B', 'Lesson_2', 100, ""],
['Abel', 'Course_B', 'Lesson_3', 100, ""],
['Abel', 'Course_B', 'Lesson_4', 100, ""],
['Bain', 'Course_B', 'Lesson_1', 100, ""],
['Bain', 'Course_B', 'Lesson_2', 100, ""],
['Coot', 'Course_B', 'Lesson_1', 100, ""],
['Coot', 'Course_B', 'Lesson_2', 100, ""],
['Coot', 'Course_B', 'Lesson_3', 100, ""],
['Coot', 'Course_B', 'Lesson_4', 100, ""],
['Coot', 'Course_B', 'Lesson_5', 100, ""]]
Course_A_df = pd.DataFrame(data_list_1[1:], columns = data_list_1[0])
Course_B_df = pd.DataFrame(data_list_2[1:], columns = data_list_2[0])
I want to have the following dataframe to use it in Tableau for visualisation purpose. Basically the final df should also have Coot with None values and for Course_B Status to be completed if all 5 Lesson score is 100.
to_achieved = [['Employee', 'Course Name', 'Lesson Name', 'Lesson Score', 'Status'],
['Abel', "Course_A", None, None, "Completed"],
['Bain', "Course_A", None, None, "Incomplete"],
['Coot', "Course_A", None, None, None],
['Abel', 'Course_B', 'Lesson_1', 100, "Incomplete"],
['Abel', 'Course_B', 'Lesson_2', 100, "Incomplete"],
['Abel', 'Course_B', 'Lesson_3', 100, "Incomplete"],
['Abel', 'Course_B', 'Lesson_4', 100, "Incomplete"],
['Bain', 'Course_B', 'Lesson_1', 100, "Incomplete"],
['Bain', 'Course_B', 'Lesson_2', 100, "Incomplete"],
['Coot', 'Course_B', 'Lesson_1', 100, "Completed"],
['Coot', 'Course_B', 'Lesson_2', 100, "Completed"],
['Coot', 'Course_B', 'Lesson_3', 100, "Completed"],
['Coot', 'Course_B', 'Lesson_4', 100, "Completed"],
['Coot', 'Course_B', 'Lesson_5', 100, "Completed"]]
to_achieved_df = pd.DataFrame(to_achieved[1:], columns = to_achieved[0])
to_achieved_df
I have tried concat and merge but it doesn't seems to give me what i want.
df_concat = pd.concat([Course_A_df, Course_B_df], axis=0, ignore_index=True)
df_concat
merged = pd.merge(left=Course_A_df, right=Course_B_df, left_on='Employee', right_on='Employee', how='left')
merged
For the calculation of status, i have tried groupby, but is that any way i can check if the value is 500 and update the status?
Thank you!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
您可以
.reindex
course_a_df
添加缺失的员工:打印:
然后将“状态”列添加到
cource_b_b_df
:prints:prints:
ofts: thins offs:thins
。 concat
两个:打印:
You can
.reindex
Course_A_df
to add missing Employees:Prints:
Then add "Status" column to
Course_B_df
:Prints:
and finally
.concat
the two:Prints: