pandas groupby+apply+lambda 怎么实现分组后再分组（再分组是自定义条件）???

发布于 2022-09-12 03:07:45 字数 1692 浏览 62 评论 0

微信图片_20200608101338.png
微信图片_20200608101344.png
模拟数据
a = pd.DataFrame([[2,3],[2,1],[2,1],[3,4],[3,1],[3,1],[3,1],[3,1],[4,2],[4,1],[4,1],[4,1]],columns=['id','count'])
a['date'] = [datetime.datetime.strptime(x,'%Y-%m-%d %H:%M:%S') for x in

          ['2016-12-28 15:17:00','2016-12-28 15:29:00','2017-01-05 09:32:00','2016-12-03 18:10:00','2016-12-10 11:31:00',
            '2016-12-14 09:32:00','2016-12-18 09:31:00','2016-12-22 09:32:00','2016-11-28 15:31:00','2016-12-01 16:11:00',
           '2016-12-10 09:31:00','2016-12-13 12:06:00']]

写循环方式实现
a.sort_values(by=['id','date'],ascending = [True,False],inplace=True)
a['id'] = a['id'].astype(str)
a['id_up'] = a['id'].shift(-1)
a['id_down'] = a['id'].shift(1)
a['date_up'] = a['date'].shift(-1)
a['date_diff'] = a.apply(lambda a: (a['date'] - a['date_up'])/timedelta(days=1) if a['id'] == a['id_up'] else 0, axis=1)
a = a.reset_index()
a = a.drop(['index','id_up','id_down','date_up'],axis=1)
a['new'] = ''
for i in range(a.shape[0]):

if i == 0:
    a.loc[i,'new'] = 1
else:
    if a.loc[i,'id'] != a.loc[i-1,'id']:
        a.loc[i,'new'] = 1
    else:
        if a.loc[i-1,'date_diff'] <= 4:
            a.loc[i,'new'] = a.loc[i-1,'new']
        else:
            a.loc[i,'new'] = a.loc[i-1,'new'] + 1

a['new'] = a['id'].astype(str) + '-' + a['new'].astype(str)
我的数据源很多，已经开了很多个进程了，在处理数据时只能考虑线程，协程了
但是想通过pandas的特性来实现，我在excel上面实现比python要快得多
现在350万的数据，我处理12个小时还没完成......

分享到QQ

分享到微博