过滤数据框以获取树结构的叶子
我有一个带有列级级别级别(类似于目录路径)的列的数据框。我只是试图将记录保留在最新一代(层次树的叶子)中。我尝试了使用Transformby和Groupby的几种方法,但无法获得所需的输出
代码
import numpy as np
import pandas as pd
df = pd.DataFrame({'lvl1':['aa','aa','aa','aa','bb','bb','bb','bb','cc','aa'],
'lvl2':[np.nan,'xx','xx','xx',np.nan,'yy','yy','zz',np.nan,'sa'],
'lvl3':[np.nan,np.nan,'ww','qq',np.nan,np.nan,'rr',np.nan,np.nan,'jj'],
'value':[12,4,7,22,76,0,18,47,10,2]})
result = pd.DataFrame({'lvl1':['aa','aa','bb','bb','cc','aa'],
'lvl2':['xx','xx','yy','zz',np.nan,'sa'],
'lvl3':['ww','qq','rr',np.nan,np.nan,'jj'],
'value':[7,22,18,47,10,2]})
figue
感谢您的帮助
I have a dataframe with columns for levels of hierarchy (similar to directory path). I am trying to keep only the records with the latest generation in the levels (leaves of the hierarchy tree). I tried couple ways with transform and groupby but unable to get the desired output
Code
import numpy as np
import pandas as pd
df = pd.DataFrame({'lvl1':['aa','aa','aa','aa','bb','bb','bb','bb','cc','aa'],
'lvl2':[np.nan,'xx','xx','xx',np.nan,'yy','yy','zz',np.nan,'sa'],
'lvl3':[np.nan,np.nan,'ww','qq',np.nan,np.nan,'rr',np.nan,np.nan,'jj'],
'value':[12,4,7,22,76,0,18,47,10,2]})
result = pd.DataFrame({'lvl1':['aa','aa','bb','bb','cc','aa'],
'lvl2':['xx','xx','yy','zz',np.nan,'sa'],
'lvl3':['ww','qq','rr',np.nan,np.nan,'jj'],
'value':[7,22,18,47,10,2]})
Figure
Appreciate your help
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
如果需要,每行的最大唯一值,对于每个类别而不是最大值,仅限上一行使用:
If need filter rows by maximum unique values per rows and for each category not maximum only last row use: