带有总计/小计的 groupby

发布于 2025-01-10 11:37:12 字数 1644 浏览 0 评论 0原文

假设我有以下数据框，

Strategy    AssetClass    Symbol         Value    Indicator
Strat1      OPT           OPT_ABC1       50       -0.3
Strat1      OPT           OPT_ABC2       50       1.5
Strat1      STK           STK_ABC        50       2.7
Strat2      STK           STK_XYZ        70       -3.8
Strat3      OPT           OPT_MNO        25       10

我想生成以下内容：

Strategy    AssetClass    Symbol    Value    Indicator
Strat1                                       3.9
            OPT                              1.2
                          OPT_ABC1  50       -0.3
                          OPT_ABC2  50       1.5
            STK                              2.7
                          STK_ABC   50       2.7
Strat2                                       -3.8
            STK                              -3.8
                          STK_XYZ   70       -3.8
Strat3                                       10
            OPT                              10
                          OPT_MNO   25       10

因此，我们的想法是重新排列数据，其中包含每个策略的总计，然后是资产类别，然后是每个交易品种。 “值”列在符号级别可用，而“指标”列是子组的总和。

我想过使用 pd.pivot_table 但它似乎没有产生我正在寻找的总计/子总计。我认为我应该在 Strategy 上使用/循环 pd.groupby，然后在 Strategy/AssetClass 上循环另一个 groupby，然后在 Strategy/AssetClass/Symbol 上循环 groupby

df 是上面的数据帧，我这样做了：

container = []
for label, _df in df.groupby(['Strategy', 'AssetClass', 'Symbol']):
    _df.loc[f'{label}'] = _df[['Indicator']].sum()
    container.append(_df)

df_res = pd.concat(container)
print(df_res.fillna(''))

我的问题是小计插入到相应行之后，并使用标签作为索引。此外，我无法找出添加其他 lopps（即小计）的简单/Pythonic 方法

原文

Say I have the following dataframe

Strategy    AssetClass    Symbol         Value    Indicator
Strat1      OPT           OPT_ABC1       50       -0.3
Strat1      OPT           OPT_ABC2       50       1.5
Strat1      STK           STK_ABC        50       2.7
Strat2      STK           STK_XYZ        70       -3.8
Strat3      OPT           OPT_MNO        25       10

I would like to produce the following:

Strategy    AssetClass    Symbol    Value    Indicator
Strat1                                       3.9
            OPT                              1.2
                          OPT_ABC1  50       -0.3
                          OPT_ABC2  50       1.5
            STK                              2.7
                          STK_ABC   50       2.7
Strat2                                       -3.8
            STK                              -3.8
                          STK_XYZ   70       -3.8
Strat3                                       10
            OPT                              10
                          OPT_MNO   25       10

So the idea is to rearrange the data with totals per Strategy, then AssetClass and then per Symbol. The column "Value" is available at the Symbol level, while the column "Indicator" is the sum of the subgroup.

I thought of using pd.pivot_table but it doesn't seem to produce the totals/sub_totals I am looking for. I think I should use/loop over pd.groupby on Strategy and then loop over another groupby on Strategy/AssetClass and then loop over a groupby on Strategy/AssetClass/Symbol

With df being the dataframe above, I did this:

container = []
for label, _df in df.groupby(['Strategy', 'AssetClass', 'Symbol']):
    _df.loc[f'{label}'] = _df[['Indicator']].sum()
    container.append(_df)

df_res = pd.concat(container)
print(df_res.fillna(''))

My problem is that the subtotal is inserted after the corresponding rows and the label is used as index. Besides I can't figure out an easy/pythonic way of adding the other lopps(ie subtotals)

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

铁憨憨 2025-01-17 11:37:12

您可以按不同的列进行聚合，因此为了提高性能，最好不要使用嵌套的 groupby.apply 而是使用多重聚合，最后通过 concat，按 DataFrame.reindex 以及前 2 列的最后排序：

df1 = df.groupby(['Strategy', 'AssetClass', 'Symbol'], as_index=False).sum()

df2 = (df1.groupby(['Strategy', 'AssetClass'], as_index=False)['Indicator'].sum()
          .assign(Symbol = ''))

df3 = (df1.groupby('Strategy', as_index=False)['Indicator'].sum()
          .assign(AssetClass = ''))

df = (pd.concat([df3, df2, df1])
        .reindex(df.columns, axis=1)
        .fillna('')
        .sort_values(['Strategy','AssetClass'], ignore_index=True))
print (df)
   Strategy AssetClass    Symbol Value  Indicator
0    Strat1                                   3.9
1    Strat1        OPT                        1.2
2    Strat1        OPT  OPT_ABC1  50.0       -0.3
3    Strat1        OPT  OPT_ABC2  50.0        1.5
4    Strat1        STK                        2.7
5    Strat1        STK   STK_ABC  50.0        2.7
6    Strat2                                  -3.8
7    Strat2        STK                       -3.8
8    Strat2        STK   STK_XYZ  70.0       -3.8
9    Strat3                                  10.0
10   Strat3        OPT                       10.0
11   Strat3        OPT   OPT_MNO  25.0       10.0

You can aggregate by different columns, so for performance is better not use nested groupby.apply but rather multple aggregation, last join them togehether by concat, change order of columns by DataFrame.reindex and last sorting per first 2 columns:

df1 = df.groupby(['Strategy', 'AssetClass', 'Symbol'], as_index=False).sum()

df2 = (df1.groupby(['Strategy', 'AssetClass'], as_index=False)['Indicator'].sum()
          .assign(Symbol = ''))

df3 = (df1.groupby('Strategy', as_index=False)['Indicator'].sum()
          .assign(AssetClass = ''))

df = (pd.concat([df3, df2, df1])
        .reindex(df.columns, axis=1)
        .fillna('')
        .sort_values(['Strategy','AssetClass'], ignore_index=True))
print (df)
   Strategy AssetClass    Symbol Value  Indicator
0    Strat1                                   3.9
1    Strat1        OPT                        1.2
2    Strat1        OPT  OPT_ABC1  50.0       -0.3
3    Strat1        OPT  OPT_ABC2  50.0        1.5
4    Strat1        STK                        2.7
5    Strat1        STK   STK_ABC  50.0        2.7
6    Strat2                                  -3.8
7    Strat2        STK                       -3.8
8    Strat2        STK   STK_XYZ  70.0       -3.8
9    Strat3                                  10.0
10   Strat3        OPT                       10.0
11   Strat3        OPT   OPT_MNO  25.0       10.0

回复收藏 0 原文

~没有更多了~