如何组合2个数据框,创建仅在第二个数据框中出现的行,而不出现在第一范围内,而不是在第一个框架和组中以获取总和?

发布于 2025-01-23 14:10:18 字数 2607 浏览 1 评论 0原文

我想结合2个数据范围。我尝试了几种方法,但不确定如何实现最终数据框架。感谢有关我该怎么做的任何建议。

data_list_1 = [['Employee', 'Course Name', 'Status'],
              ['Abel', "Course_A", "Completed"],
              ['Bain', "Course_A", "Incomplete"]]

data_list_2 = [['Employee', 'Course Name', 'Lesson Name', 'Lesson Score', 'Status'],
              ['Abel', 'Course_B', 'Lesson_1', 100, ""],
              ['Abel', 'Course_B', 'Lesson_2', 100, ""],
              ['Abel', 'Course_B', 'Lesson_3', 100, ""],
              ['Abel', 'Course_B', 'Lesson_4', 100, ""],
              ['Bain', 'Course_B', 'Lesson_1', 100, ""],
              ['Bain', 'Course_B', 'Lesson_2', 100, ""],
              ['Coot', 'Course_B', 'Lesson_1', 100, ""],
              ['Coot', 'Course_B', 'Lesson_2', 100, ""],
              ['Coot', 'Course_B', 'Lesson_3', 100, ""],
              ['Coot', 'Course_B', 'Lesson_4', 100, ""],
              ['Coot', 'Course_B', 'Lesson_5', 100, ""]]

Course_A_df = pd.DataFrame(data_list_1[1:], columns = data_list_1[0])
Course_B_df = pd.DataFrame(data_list_2[1:], columns = data_list_2[0])

我想在Tableau中使用以下数据框架来可视化目的。基本上,最终的DF也应该有一个没有值的coot,如果所有5个课程得分均为100

to_achieved = [['Employee', 'Course Name', 'Lesson Name', 'Lesson Score', 'Status'],
              ['Abel', "Course_A", None, None, "Completed"],
              ['Bain', "Course_A", None, None, "Incomplete"],
              ['Coot', "Course_A", None, None, None],              
              ['Abel', 'Course_B', 'Lesson_1', 100, "Incomplete"],
              ['Abel', 'Course_B', 'Lesson_2', 100, "Incomplete"],
              ['Abel', 'Course_B', 'Lesson_3', 100, "Incomplete"],
              ['Abel', 'Course_B', 'Lesson_4', 100, "Incomplete"],
              ['Bain', 'Course_B', 'Lesson_1', 100, "Incomplete"],
              ['Bain', 'Course_B', 'Lesson_2', 100, "Incomplete"],
              ['Coot', 'Course_B', 'Lesson_1', 100, "Completed"],
              ['Coot', 'Course_B', 'Lesson_2', 100, "Completed"],
              ['Coot', 'Course_B', 'Lesson_3', 100, "Completed"],
              ['Coot', 'Course_B', 'Lesson_4', 100, "Completed"],
              ['Coot', 'Course_B', 'Lesson_5', 100, "Completed"]]

to_achieved_df = pd.DataFrame(to_achieved[1:], columns = to_achieved[0])
to_achieved_df

df_concat = pd.concat([Course_A_df, Course_B_df], axis=0, ignore_index=True)
df_concat
merged = pd.merge(left=Course_A_df, right=Course_B_df, left_on='Employee', right_on='Employee', how='left')
merged

对于状态的计算,我尝试了Groupby,但是我可以检查该值是否为500并更新状态?

谢谢你!

I want to combine 2 dataframes. I have tried several methods but not sure how I can achieve the final dataframe. Appreciate any advice on how can i do this.

data_list_1 = [['Employee', 'Course Name', 'Status'],
              ['Abel', "Course_A", "Completed"],
              ['Bain', "Course_A", "Incomplete"]]

data_list_2 = [['Employee', 'Course Name', 'Lesson Name', 'Lesson Score', 'Status'],
              ['Abel', 'Course_B', 'Lesson_1', 100, ""],
              ['Abel', 'Course_B', 'Lesson_2', 100, ""],
              ['Abel', 'Course_B', 'Lesson_3', 100, ""],
              ['Abel', 'Course_B', 'Lesson_4', 100, ""],
              ['Bain', 'Course_B', 'Lesson_1', 100, ""],
              ['Bain', 'Course_B', 'Lesson_2', 100, ""],
              ['Coot', 'Course_B', 'Lesson_1', 100, ""],
              ['Coot', 'Course_B', 'Lesson_2', 100, ""],
              ['Coot', 'Course_B', 'Lesson_3', 100, ""],
              ['Coot', 'Course_B', 'Lesson_4', 100, ""],
              ['Coot', 'Course_B', 'Lesson_5', 100, ""]]

Course_A_df = pd.DataFrame(data_list_1[1:], columns = data_list_1[0])
Course_B_df = pd.DataFrame(data_list_2[1:], columns = data_list_2[0])

I want to have the following dataframe to use it in Tableau for visualisation purpose. Basically the final df should also have Coot with None values and for Course_B Status to be completed if all 5 Lesson score is 100.

to_achieved = [['Employee', 'Course Name', 'Lesson Name', 'Lesson Score', 'Status'],
              ['Abel', "Course_A", None, None, "Completed"],
              ['Bain', "Course_A", None, None, "Incomplete"],
              ['Coot', "Course_A", None, None, None],              
              ['Abel', 'Course_B', 'Lesson_1', 100, "Incomplete"],
              ['Abel', 'Course_B', 'Lesson_2', 100, "Incomplete"],
              ['Abel', 'Course_B', 'Lesson_3', 100, "Incomplete"],
              ['Abel', 'Course_B', 'Lesson_4', 100, "Incomplete"],
              ['Bain', 'Course_B', 'Lesson_1', 100, "Incomplete"],
              ['Bain', 'Course_B', 'Lesson_2', 100, "Incomplete"],
              ['Coot', 'Course_B', 'Lesson_1', 100, "Completed"],
              ['Coot', 'Course_B', 'Lesson_2', 100, "Completed"],
              ['Coot', 'Course_B', 'Lesson_3', 100, "Completed"],
              ['Coot', 'Course_B', 'Lesson_4', 100, "Completed"],
              ['Coot', 'Course_B', 'Lesson_5', 100, "Completed"]]

to_achieved_df = pd.DataFrame(to_achieved[1:], columns = to_achieved[0])
to_achieved_df

I have tried concat and merge but it doesn't seems to give me what i want.

df_concat = pd.concat([Course_A_df, Course_B_df], axis=0, ignore_index=True)
df_concat
merged = pd.merge(left=Course_A_df, right=Course_B_df, left_on='Employee', right_on='Employee', how='left')
merged

For the calculation of status, i have tried groupby, but is that any way i can check if the value is 500 and update the status?

Thank you!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

还给你自由 2025-01-30 14:10:18

您可以.reindex course_a_df添加缺失的员工:

Course_A_df = (
    Course_A_df.set_index("Employee")
    .reindex(Course_B_df["Employee"].unique())
    .reset_index()
)
Course_A_df["Course Name"] = Course_A_df["Course Name"].ffill().bfill()

打印:

  Employee Course Name      Status
0     Abel    Course_A   Completed
1     Bain    Course_A  Incomplete
2     Coot    Course_A         NaN

然后将“状态”列添加到cource_b_b_df

Course_B_df["Status"] = Course_B_df.groupby(
    ["Employee", "Course Name"], as_index=False
)["Lesson Score"].transform(
    lambda x: "Complete" if x.sum() == 500 else "Incomplete"
)

prints:prints:

   Employee Course Name Lesson Name  Lesson Score      Status
0      Abel    Course_B    Lesson_1           100  Incomplete
1      Abel    Course_B    Lesson_2           100  Incomplete
2      Abel    Course_B    Lesson_3           100  Incomplete
3      Abel    Course_B    Lesson_4           100  Incomplete
4      Bain    Course_B    Lesson_1           100  Incomplete
5      Bain    Course_B    Lesson_2           100  Incomplete
6      Coot    Course_B    Lesson_1           100    Complete
7      Coot    Course_B    Lesson_2           100    Complete
8      Coot    Course_B    Lesson_3           100    Complete
9      Coot    Course_B    Lesson_4           100    Complete
10     Coot    Course_B    Lesson_5           100    Complete

ofts: thins offs:thins 。 concat两个:

out = pd.concat([Course_A_df, Course_B_df])
print(out[["Employee", "Course Name", "Lesson Name", "Lesson Score", "Status"]])

打印:

   Employee Course Name Lesson Name  Lesson Score      Status
0      Abel    Course_A         NaN           NaN   Completed
1      Bain    Course_A         NaN           NaN  Incomplete
2      Coot    Course_A         NaN           NaN         NaN
0      Abel    Course_B    Lesson_1         100.0  Incomplete
1      Abel    Course_B    Lesson_2         100.0  Incomplete
2      Abel    Course_B    Lesson_3         100.0  Incomplete
3      Abel    Course_B    Lesson_4         100.0  Incomplete
4      Bain    Course_B    Lesson_1         100.0  Incomplete
5      Bain    Course_B    Lesson_2         100.0  Incomplete
6      Coot    Course_B    Lesson_1         100.0    Complete
7      Coot    Course_B    Lesson_2         100.0    Complete
8      Coot    Course_B    Lesson_3         100.0    Complete
9      Coot    Course_B    Lesson_4         100.0    Complete
10     Coot    Course_B    Lesson_5         100.0    Complete

You can .reindex Course_A_df to add missing Employees:

Course_A_df = (
    Course_A_df.set_index("Employee")
    .reindex(Course_B_df["Employee"].unique())
    .reset_index()
)
Course_A_df["Course Name"] = Course_A_df["Course Name"].ffill().bfill()

Prints:

  Employee Course Name      Status
0     Abel    Course_A   Completed
1     Bain    Course_A  Incomplete
2     Coot    Course_A         NaN

Then add "Status" column to Course_B_df:

Course_B_df["Status"] = Course_B_df.groupby(
    ["Employee", "Course Name"], as_index=False
)["Lesson Score"].transform(
    lambda x: "Complete" if x.sum() == 500 else "Incomplete"
)

Prints:

   Employee Course Name Lesson Name  Lesson Score      Status
0      Abel    Course_B    Lesson_1           100  Incomplete
1      Abel    Course_B    Lesson_2           100  Incomplete
2      Abel    Course_B    Lesson_3           100  Incomplete
3      Abel    Course_B    Lesson_4           100  Incomplete
4      Bain    Course_B    Lesson_1           100  Incomplete
5      Bain    Course_B    Lesson_2           100  Incomplete
6      Coot    Course_B    Lesson_1           100    Complete
7      Coot    Course_B    Lesson_2           100    Complete
8      Coot    Course_B    Lesson_3           100    Complete
9      Coot    Course_B    Lesson_4           100    Complete
10     Coot    Course_B    Lesson_5           100    Complete

and finally .concat the two:

out = pd.concat([Course_A_df, Course_B_df])
print(out[["Employee", "Course Name", "Lesson Name", "Lesson Score", "Status"]])

Prints:

   Employee Course Name Lesson Name  Lesson Score      Status
0      Abel    Course_A         NaN           NaN   Completed
1      Bain    Course_A         NaN           NaN  Incomplete
2      Coot    Course_A         NaN           NaN         NaN
0      Abel    Course_B    Lesson_1         100.0  Incomplete
1      Abel    Course_B    Lesson_2         100.0  Incomplete
2      Abel    Course_B    Lesson_3         100.0  Incomplete
3      Abel    Course_B    Lesson_4         100.0  Incomplete
4      Bain    Course_B    Lesson_1         100.0  Incomplete
5      Bain    Course_B    Lesson_2         100.0  Incomplete
6      Coot    Course_B    Lesson_1         100.0    Complete
7      Coot    Course_B    Lesson_2         100.0    Complete
8      Coot    Course_B    Lesson_3         100.0    Complete
9      Coot    Course_B    Lesson_4         100.0    Complete
10     Coot    Course_B    Lesson_5         100.0    Complete
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文