pandas groupby+apply+lambda 怎么实现分组后再分组(再分组是自定义条件)???
模拟数据
a = pd.DataFrame([[2,3],[2,1],[2,1],[3,4],[3,1],[3,1],[3,1],[3,1],[4,2],[4,1],[4,1],[4,1]],columns=['id','count'])
a['date'] = [datetime.datetime.strptime(x,'%Y-%m-%d %H:%M:%S') for x in
['2016-12-28 15:17:00','2016-12-28 15:29:00','2017-01-05 09:32:00','2016-12-03 18:10:00','2016-12-10 11:31:00',
'2016-12-14 09:32:00','2016-12-18 09:31:00','2016-12-22 09:32:00','2016-11-28 15:31:00','2016-12-01 16:11:00',
'2016-12-10 09:31:00','2016-12-13 12:06:00']]
写循环方式实现
a.sort_values(by=['id','date'],ascending = [True,False],inplace=True)
a['id'] = a['id'].astype(str)
a['id_up'] = a['id'].shift(-1)
a['id_down'] = a['id'].shift(1)
a['date_up'] = a['date'].shift(-1)
a['date_diff'] = a.apply(lambda a: (a['date'] - a['date_up'])/timedelta(days=1) if a['id'] == a['id_up'] else 0, axis=1)
a = a.reset_index()
a = a.drop(['index','id_up','id_down','date_up'],axis=1)
a['new'] = ''
for i in range(a.shape[0]):
if i == 0:
a.loc[i,'new'] = 1
else:
if a.loc[i,'id'] != a.loc[i-1,'id']:
a.loc[i,'new'] = 1
else:
if a.loc[i-1,'date_diff'] <= 4:
a.loc[i,'new'] = a.loc[i-1,'new']
else:
a.loc[i,'new'] = a.loc[i-1,'new'] + 1
a['new'] = a['id'].astype(str) + '-' + a['new'].astype(str)
我的数据源很多,已经开了很多个进程了,在处理数据时只能考虑线程,协程了
但是想通过pandas的特性来实现,我在excel上面实现比python要快得多
现在350万的数据,我处理12个小时还没完成......
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论