带有总计/小计的 groupby
假设我有以下数据框,
Strategy AssetClass Symbol Value Indicator
Strat1 OPT OPT_ABC1 50 -0.3
Strat1 OPT OPT_ABC2 50 1.5
Strat1 STK STK_ABC 50 2.7
Strat2 STK STK_XYZ 70 -3.8
Strat3 OPT OPT_MNO 25 10
我想生成以下内容:
Strategy AssetClass Symbol Value Indicator
Strat1 3.9
OPT 1.2
OPT_ABC1 50 -0.3
OPT_ABC2 50 1.5
STK 2.7
STK_ABC 50 2.7
Strat2 -3.8
STK -3.8
STK_XYZ 70 -3.8
Strat3 10
OPT 10
OPT_MNO 25 10
因此,我们的想法是重新排列数据,其中包含每个策略的总计,然后是资产类别,然后是每个交易品种。 “值”列在符号级别可用,而“指标”列是子组的总和。
我想过使用 pd.pivot_table 但它似乎没有产生我正在寻找的总计/子总计。我认为我应该在 Strategy 上使用/循环 pd.groupby,然后在 Strategy/AssetClass 上循环另一个 groupby,然后在 Strategy/AssetClass/Symbol 上循环 groupby
df 是上面的数据帧,我这样做了:
container = []
for label, _df in df.groupby(['Strategy', 'AssetClass', 'Symbol']):
_df.loc[f'{label}'] = _df[['Indicator']].sum()
container.append(_df)
df_res = pd.concat(container)
print(df_res.fillna(''))
我的问题是小计插入到相应行之后,并使用标签作为索引。此外,我无法找出添加其他 lopps(即小计)的简单/Pythonic 方法
Say I have the following dataframe
Strategy AssetClass Symbol Value Indicator
Strat1 OPT OPT_ABC1 50 -0.3
Strat1 OPT OPT_ABC2 50 1.5
Strat1 STK STK_ABC 50 2.7
Strat2 STK STK_XYZ 70 -3.8
Strat3 OPT OPT_MNO 25 10
I would like to produce the following:
Strategy AssetClass Symbol Value Indicator
Strat1 3.9
OPT 1.2
OPT_ABC1 50 -0.3
OPT_ABC2 50 1.5
STK 2.7
STK_ABC 50 2.7
Strat2 -3.8
STK -3.8
STK_XYZ 70 -3.8
Strat3 10
OPT 10
OPT_MNO 25 10
So the idea is to rearrange the data with totals per Strategy, then AssetClass and then per Symbol. The column "Value" is available at the Symbol level, while the column "Indicator" is the sum of the subgroup.
I thought of using pd.pivot_table but it doesn't seem to produce the totals/sub_totals I am looking for. I think I should use/loop over pd.groupby on Strategy and then loop over another groupby on Strategy/AssetClass and then loop over a groupby on Strategy/AssetClass/Symbol
With df being the dataframe above, I did this:
container = []
for label, _df in df.groupby(['Strategy', 'AssetClass', 'Symbol']):
_df.loc[f'{label}'] = _df[['Indicator']].sum()
container.append(_df)
df_res = pd.concat(container)
print(df_res.fillna(''))
My problem is that the subtotal is inserted after the corresponding rows and the label is used as index. Besides I can't figure out an easy/pythonic way of adding the other lopps(ie subtotals)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
您可以按不同的列进行聚合,因此为了提高性能,最好不要使用嵌套的
groupby.apply
而是使用多重聚合,最后通过concat
,按DataFrame.reindex
以及前 2 列的最后排序:You can aggregate by different columns, so for performance is better not use nested
groupby.apply
but rather multple aggregation, last join them togehether byconcat
, change order of columns byDataFrame.reindex
and last sorting per first 2 columns: