比较 df,包括数据的详细洞察
我有一个 python 项目:
df_testR with columns={'Name', 'City','Licence', 'Amount'}
df_testF with columns={'Name', 'City','Licence', 'Amount'}
我想比较两个 df。结果应该是 df,我可以在其中看到名称、城市、许可证和金额。通常,df_testR 和 df_testF 应该完全相同。 如果不一样,我想看看 Amount_R 与 Amount_F 的差异。
我提到: pandas 中两个数据帧之间的差异
但我收到一张表格仅限 TRUE 和 FALSE:
名称 | 城市 | 许可证 | 金额 |
---|---|---|---|
True | True | True | False |
但我想获得一个仅列出发生差异的行的表格,并以如下方式显示数据之间的差异:
名称 | 城市 | License | Amount_R | Amount_F |
---|---|---|---|---|
Paul | NY | YES | 200 | 500。 |
此处,两个表都包含 PAUL、NY 且 License = Yes,但表 R 包含 200 作为金额,表 F 包含 500 作为金额。我希望从我的分析中收到一个表格,该表格仅捕获发生此类差异的行。
有人可以帮忙吗?
I'm having a python project:
df_testR with columns={'Name', 'City','Licence', 'Amount'}
df_testF with columns={'Name', 'City','Licence', 'Amount'}
I want to compare both df's. Result should be a df, wehere I see the Name, City and Licence and the Amount. Normally, df_testR and df_testF should be exact same.
In case it is not the same, I want to see the difference in Amount_R vs Amount_F.
I referred to: Diff between two dataframes in pandas
But I receive a table with TRUE and FALSE only:
Name | City | Licence | Amount |
---|---|---|---|
True | True | True | False |
But I'd like to get a table that lists ONLY the lines where differences occur, and that shows the differences between the data in the way such as:
Name | City | Licence | Amount_R | Amount_F |
---|---|---|---|---|
Paul | NY | YES | 200 | 500. |
Here, both tables contain PAUL, NY and Licence = Yes, but Table R contains 200 as Amount and table F contains 500 as amount. I want to receive a table from my analysis that captures only the lines where such differences occur.
Could someone help?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
首先找到缺失的行并打印它们:
然后删除这些行并合并:
现在删除具有相同数量的行:
我假设 DF 已排序。
如果您正在处理非常大的 DF,那么最好首先过滤 DF,以使合并速度更快。
First find the missing rows and print them:
Then drop these rows and merge:
Now remove the rows that have the same amount:
I assumed the DFs are sorted.
If you're dealing with very large DFs it might be better to first filter the DFs to make the merge faster.