有效地减少数据框中的组大小
我有一个使用GroupBy函数根据每行的名称进行分组的数据框。然后,我想将每个组减少到给定的大小。然后,我将这些组添加到数据库中以用于其他过程。目前,我正在循环中这样做,但这似乎确实效率低下。是否有一种方法必须更有效地这样做?
grouped = df.groupby(['NAME'])
total = grouped.ngroups
df_final = pd.DataFrame()
for name, group in grouped:
target_number_rows = 10
if len(group.index) > target_number_rows:
shortened = group[::int(len(group.index) / target_number_rows)]
df_final = pd.concat([df_final, shortened], ignore_index=True)
I have a dataframe which I am grouping based on the names of each row using the groupby function. I then want to reduce each group to a given size. I then add these groups back into a database to use for other processes. Currently I am doing this in a for loop but this seems really inefficient. Is there a method which pandas has to do this more efficiently?
grouped = df.groupby(['NAME'])
total = grouped.ngroups
df_final = pd.DataFrame()
for name, group in grouped:
target_number_rows = 10
if len(group.index) > target_number_rows:
shortened = group[::int(len(group.index) / target_number_rows)]
df_final = pd.concat([df_final, shortened], ignore_index=True)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
按名称组并应用
示例
(在该组中随机n随机n),其中n是您所需的金额或该组的完整金额,例如:否则,请使用第一个N或最后一个,例如:
Group by the name and apply a
sample
(that'll take randomly N within that group) where N is either your desired amount or the complete amount for that group, eg:Otherwise, take the first N or last N, eg: