Pandas Drop_duplicates在其他两个列值上的条件
我有一个带有A,B和C的Datframe。A
列A是重复的位置。 B列是具有电子邮件值或NAN的地方。 C列是“等待”值或数字的地方。
我的数据框在A中具有重复的值。我想保留那些在B中具有非nan值的人,而C(即编号)中的“等待”值。
我该如何在DF DataFrame上执行此操作?
我尝试了df.drop_duplicates('a'),但我看不到其他列上的任何条件
: 示例数据:
df=pd.DataFrame({'A':[1,1,2,2,3,3],'B':['[email protected]',np.nan,np.nan,'[email protected]','np.nan',np.nan],'C':[123,456,567,'wait','wait','wait']})
>>> df
A B C
0 1 [email protected] 123
1 1 NaN 456
2 2 NaN 567
3 2 [email protected] wait
4 3 np.nan wait
5 3 NaN wait
我想要由此产生的数据框架作为
>>> df
A B C
0 1 [email protected] 123
1 2 [email protected] 567
2 3 np.nan wait
谢谢 最好的,
I have a datframe with columns A,B and C.
Column A is where there are duplicates. Column B is where there is email value or NaN. Column C is where there is 'wait' value or a number.
My dataframe has duplicate values in A. I would like to keep those who have a non-NaN value in B and the non 'wait' value in C (ie numbers).
How could I do that on a df dataframe?
I have tried df.drop_duplicates('A') but i dont see any conditions on other columns
Edit :
sample data :
df=pd.DataFrame({'A':[1,1,2,2,3,3],'B':['[email protected]',np.nan,np.nan,'[email protected]','np.nan',np.nan],'C':[123,456,567,'wait','wait','wait']})
>>> df
A B C
0 1 [email protected] 123
1 1 NaN 456
2 2 NaN 567
3 2 [email protected] wait
4 3 np.nan wait
5 3 NaN wait
I would like a resulting dataframe as
>>> df
A B C
0 1 [email protected] 123
1 2 [email protected] 567
2 3 np.nan wait
Thank you
Best,
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
解决方案对Per
A,C
列进行测试,如果匹配Wait
首先,然后获得第一个非缺失值,如果每个组存在列,则a
:Solution sorting per
A, C
columns with test if matchwait
first and then get first non missing value if exist per groups by columnA
: