Pandas Pivot并包括具有可变值的列

发布于 2025-02-11 19:46:33 字数 2810 浏览 2 评论 0原文

我有此dataframe jpm_2021：

      SRC     SRCDate Ticker  Coupon  Vintage   Bal   WAC  WAM  WALA       LNSZ  LTV  FICO  Refi% Month_Assessed       CPR Month_key
894   JPM  02/05/2021   FNCI     1.5     2020  28.7  2.25  175     4  293 / 286   60   777     91            Apr  7.536801       M+2
1528  JPM  03/05/2021   FNCI     1.5     2020  28.7  2.25  175     4  293 / 286   60   777     91            Apr  5.131145       M+1
2162  JPM  04/07/2021   FNCI     1.5     2020  28.0  2.25  173     6  292 / 281   60   777     91            Apr  7.233214         M
2796  JPM  05/07/2021   FNCI     1.5     2020  27.6  2.25  171     7  292 / 279   60   777     91            Apr  8.900000       M-1
3430  JPM  06/07/2021   FNCI     1.5     2020  27.2  2.25  170     8  292 / 277   60   777     91            Apr  8.900000       M-2

并使用pandas pivot（）函数如下：

jpm_final = jpm_2021.pivot(index=['SRC', 'Ticker', 'Coupon', 'Vintage', 'Month_Assessed'], columns="Month_key", values="CPR").rename_axis(columns=None).reset_index()

将其转换为：

    SRC Ticker  Coupon  Vintage Month_Assessed  M    M+1    M+2    M-1  M-2
0   JPM FNCI    1.5     2020    Apr            7.23  5.13   7.53   8.9  8.9
1   JPM FNCI    1.5     2020    Aug            15.16 14.92  11.97  24.9 24.9
2   JPM FNCI    1.5     2020    Dec            11.58 14.51  19.00  5.0  5.0
3   JPM FNCI    1.5     2020    Feb            6.70  4.18   9.84   6.6  8.8
4   JPM FNCI    1.5     2020    Jan            4.29  10.19  12.88  6.6  5.0

问题是我要包括所有中间列（BAL（$ bn）通过refi％），但是当我将其添加到index = in jpm_2021.pivot（）中时，它添加了许多行，M - ＆GT; M-2列开始具有许多nan值。这是因为这些列的值在整个数据范围内略有变化。有什么想法，我如何在不生成所有NAN的情况下将这些列添加到枢纽的数据框架中？例如，如果我确实包括bal（$ bn），则看起来像这样：

SRC Ticker  Coupon  Vintage Month_Assessed  Bal ($bn)   M   M+1 M+2 M-1 M-2
0   JPM     FNCI    1.5 2020    Apr            27.2   NaN   NaN NaN NaN 8.9
1   JPM     FNCI    1.5 2020    Apr            27.6   NaN   NaN NaN 8.9 NaN
2   JPM     FNCI    1.5 2020    Apr            28     7.23  NaN NaN NaN NaN
3   JPM     FNCI    1.5 2020    Apr            28.7   NaN  5.13 7.53 NaN NaN
4   JPM     FNCI    1.5 2020    Aug            24.9       NaN   NaN NaN NaN 24.9
... ... ... ... ... ... ... ... ... ... ... ...
7069 JPM    G2SF    5.5 2008    May        1.2        24    21 21 24.3 NaN
7070 JPM    G2SF    5.5 2008    Nov        1.1        23    21 20 23.2  NaN
7071 JPM    G2SF    5.5 2008    Nov        1.3        NaN  NaN  NaN NaN 21.9
7072 JPM    G2SF    5.5 2008    Oct        1.1        21   20 23 24 25  NaN
7073 JPM    G2SF    5.5 2008    Sep        1.1        21   24 25 22 22 23

原文

I have this dataframe jpm_2021:

      SRC     SRCDate Ticker  Coupon  Vintage   Bal   WAC  WAM  WALA       LNSZ  LTV  FICO  Refi% Month_Assessed       CPR Month_key
894   JPM  02/05/2021   FNCI     1.5     2020  28.7  2.25  175     4  293 / 286   60   777     91            Apr  7.536801       M+2
1528  JPM  03/05/2021   FNCI     1.5     2020  28.7  2.25  175     4  293 / 286   60   777     91            Apr  5.131145       M+1
2162  JPM  04/07/2021   FNCI     1.5     2020  28.0  2.25  173     6  292 / 281   60   777     91            Apr  7.233214         M
2796  JPM  05/07/2021   FNCI     1.5     2020  27.6  2.25  171     7  292 / 279   60   777     91            Apr  8.900000       M-1
3430  JPM  06/07/2021   FNCI     1.5     2020  27.2  2.25  170     8  292 / 277   60   777     91            Apr  8.900000       M-2

And use pandas pivot() function as below:

jpm_final = jpm_2021.pivot(index=['SRC', 'Ticker', 'Coupon', 'Vintage', 'Month_Assessed'], columns="Month_key", values="CPR").rename_axis(columns=None).reset_index()

To convert it to:

    SRC Ticker  Coupon  Vintage Month_Assessed  M    M+1    M+2    M-1  M-2
0   JPM FNCI    1.5     2020    Apr            7.23  5.13   7.53   8.9  8.9
1   JPM FNCI    1.5     2020    Aug            15.16 14.92  11.97  24.9 24.9
2   JPM FNCI    1.5     2020    Dec            11.58 14.51  19.00  5.0  5.0
3   JPM FNCI    1.5     2020    Feb            6.70  4.18   9.84   6.6  8.8
4   JPM FNCI    1.5     2020    Jan            4.29  10.19  12.88  6.6  5.0

The problem is I want to include all the middle columns (Bal ($bn) through Refi%) but when I add those to the index= call in jpm_2021.pivot(), it adds many rows and the M-->M-2 columns start to have many NaN values. This is because the value of those columns changes slightly throughout the dataframe. Any ideas how I can add those columns to the pivoted dataframe without generating all the NaNs? If I do include for example, Bal ($bn), it looks like this:

SRC Ticker  Coupon  Vintage Month_Assessed  Bal ($bn)   M   M+1 M+2 M-1 M-2
0   JPM     FNCI    1.5 2020    Apr            27.2   NaN   NaN NaN NaN 8.9
1   JPM     FNCI    1.5 2020    Apr            27.6   NaN   NaN NaN 8.9 NaN
2   JPM     FNCI    1.5 2020    Apr            28     7.23  NaN NaN NaN NaN
3   JPM     FNCI    1.5 2020    Apr            28.7   NaN  5.13 7.53 NaN NaN
4   JPM     FNCI    1.5 2020    Aug            24.9       NaN   NaN NaN NaN 24.9
... ... ... ... ... ... ... ... ... ... ... ...
7069 JPM    G2SF    5.5 2008    May        1.2        24    21 21 24.3 NaN
7070 JPM    G2SF    5.5 2008    Nov        1.1        23    21 20 23.2  NaN
7071 JPM    G2SF    5.5 2008    Nov        1.3        NaN  NaN  NaN NaN 21.9
7072 JPM    G2SF    5.5 2008    Oct        1.1        21   20 23 24 25  NaN
7073 JPM    G2SF    5.5 2008    Sep        1.1        21   24 25 22 22 23

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

〃温暖了心ぐ 2025-02-18 19:46:33

您要的东西没有多大意义。如果您旋转数据框，并制作其索引列，而索引列对于每个枢纽值都不会带有伴随的值...则可以预期拥有NAN值。

您可以使它变得非常宽泛：

df.pivot(index=['SRC', 'Ticker', 'Coupon', 'Vintage', 'Month_Assessed'], columns='Month_key')

...

                                             SRCDate                                                   Bal                           WAC                          WAM                     WALA              ...       LNSZ                                  LTV                 FICO                     Refi%                       CPR
Month_key                                          M         M+1         M+2         M-1         M-2     M   M+1   M+2   M-1   M-2     M   M+1   M+2   M-1   M-2    M  M+1  M+2  M-1  M-2    M M+1 M+2 M-1  ...        M+1        M+2        M-1        M-2   M M+1 M+2 M-1 M-2    M  M+1  M+2  M-1  M-2     M M+1 M+2 M-1 M-2         M       M+1       M+2  M-1  M-2
SRC Ticker Coupon Vintage Month_Assessed                                                                                                                                                                    ...
JPM FNCI   1.5    2020    Apr             04/07/2021  03/05/2021  02/05/2021  05/07/2021  06/07/2021  28.0  28.7  28.7  27.6  27.2  2.25  2.25  2.25  2.25  2.25  173  175  175  171  170    6   4   4   7  ...  293 / 286  293 / 286  292 / 279  292 / 277  60  60  60  60  60  777  777  777  777  777    91  91  91  91  91  7.233214  5.131145  7.536801  8.9  8.9

或者可以解开具有有趣的索引：

df.pivot(index=['SRC', 'Ticker', 'Coupon', 'Vintage', 'Month_Assessed'], columns='Month_key').stack(0)

...

Month_key                                                  M         M+1         M+2         M-1         M-2
SRC Ticker Coupon Vintage Month_Assessed
JPM FNCI   1.5    2020    Apr            Bal            28.0        28.7        28.7        27.6        27.2
                                         CPR        7.233214    5.131145    7.536801         8.9         8.9
                                         FICO            777         777         777         777         777
                                         LNSZ      292 / 281   293 / 286   293 / 286   292 / 279   292 / 277
                                         LTV              60          60          60          60          60
                                         Refi%            91          91          91          91          91
                                         SRCDate  04/07/2021  03/05/2021  02/05/2021  05/07/2021  06/07/2021
                                         WAC            2.25        2.25        2.25        2.25        2.25
                                         WALA              6           4           4           7           8
                                         WAM             173         175         175         171         170

但是，除非您更新那些略有不同以至于相同的列，否则它将无法神奇地为您做到这一点并从中创建一些东西没有什么。

一种方法可能用平均值替代的方法：

df['Bal'] = df.groupby(['SRC', 'Ticker', 'Coupon', 'Vintage', 'Month_Assessed'])['Bal'].transform('mean')
df.pivot(index=['SRC', 'Ticker', 'Coupon', 'Vintage', 'Month_Assessed', 'Bal'], columns='Month_key', values='CPR')

输出：

Month_key                                              M       M+1       M+2  M-1  M-2
SRC Ticker Coupon Vintage Month_Assessed Bal
JPM FNCI   1.5    2020    Apr            28.04  7.233214  5.131145  7.536801  8.9  8.9

What you're asking for doesn't make much sense. If you pivot a dataframe, and make its index columns that won't have an accompanying value for every pivoted value... then having NaN values is to be expected.

You could make it very wide like:

df.pivot(index=['SRC', 'Ticker', 'Coupon', 'Vintage', 'Month_Assessed'], columns='Month_key')

...

                                             SRCDate                                                   Bal                           WAC                          WAM                     WALA              ...       LNSZ                                  LTV                 FICO                     Refi%                       CPR
Month_key                                          M         M+1         M+2         M-1         M-2     M   M+1   M+2   M-1   M-2     M   M+1   M+2   M-1   M-2    M  M+1  M+2  M-1  M-2    M M+1 M+2 M-1  ...        M+1        M+2        M-1        M-2   M M+1 M+2 M-1 M-2    M  M+1  M+2  M-1  M-2     M M+1 M+2 M-1 M-2         M       M+1       M+2  M-1  M-2
SRC Ticker Coupon Vintage Month_Assessed                                                                                                                                                                    ...
JPM FNCI   1.5    2020    Apr             04/07/2021  03/05/2021  02/05/2021  05/07/2021  06/07/2021  28.0  28.7  28.7  27.6  27.2  2.25  2.25  2.25  2.25  2.25  173  175  175  171  170    6   4   4   7  ...  293 / 286  293 / 286  292 / 279  292 / 277  60  60  60  60  60  777  777  777  777  777    91  91  91  91  91  7.233214  5.131145  7.536801  8.9  8.9

Or maybe unstack that to have an interesting index:

df.pivot(index=['SRC', 'Ticker', 'Coupon', 'Vintage', 'Month_Assessed'], columns='Month_key').stack(0)

...

Month_key                                                  M         M+1         M+2         M-1         M-2
SRC Ticker Coupon Vintage Month_Assessed
JPM FNCI   1.5    2020    Apr            Bal            28.0        28.7        28.7        27.6        27.2
                                         CPR        7.233214    5.131145    7.536801         8.9         8.9
                                         FICO            777         777         777         777         777
                                         LNSZ      292 / 281   293 / 286   293 / 286   292 / 279   292 / 277
                                         LTV              60          60          60          60          60
                                         Refi%            91          91          91          91          91
                                         SRCDate  04/07/2021  03/05/2021  02/05/2021  05/07/2021  06/07/2021
                                         WAC            2.25        2.25        2.25        2.25        2.25
                                         WALA              6           4           4           7           8
                                         WAM             173         175         175         171         170

But unless you update those columns that are slightly different to be the same, it's not going to magically be able to do that for you and create something out of nothing.

One way to approach that might be replacing with the average:

df['Bal'] = df.groupby(['SRC', 'Ticker', 'Coupon', 'Vintage', 'Month_Assessed'])['Bal'].transform('mean')
df.pivot(index=['SRC', 'Ticker', 'Coupon', 'Vintage', 'Month_Assessed', 'Bal'], columns='Month_key', values='CPR')

Output:

Month_key                                              M       M+1       M+2  M-1  M-2
SRC Ticker Coupon Vintage Month_Assessed Bal
JPM FNCI   1.5    2020    Apr            28.04  7.233214  5.131145  7.536801  8.9  8.9

回复收藏 0 原文

~没有更多了~