Pandas Pivot并包括具有可变值的列
我有此dataframe jpm_2021
:
SRC SRCDate Ticker Coupon Vintage Bal WAC WAM WALA LNSZ LTV FICO Refi% Month_Assessed CPR Month_key
894 JPM 02/05/2021 FNCI 1.5 2020 28.7 2.25 175 4 293 / 286 60 777 91 Apr 7.536801 M+2
1528 JPM 03/05/2021 FNCI 1.5 2020 28.7 2.25 175 4 293 / 286 60 777 91 Apr 5.131145 M+1
2162 JPM 04/07/2021 FNCI 1.5 2020 28.0 2.25 173 6 292 / 281 60 777 91 Apr 7.233214 M
2796 JPM 05/07/2021 FNCI 1.5 2020 27.6 2.25 171 7 292 / 279 60 777 91 Apr 8.900000 M-1
3430 JPM 06/07/2021 FNCI 1.5 2020 27.2 2.25 170 8 292 / 277 60 777 91 Apr 8.900000 M-2
并使用pandas pivot()
函数如下:
jpm_final = jpm_2021.pivot(index=['SRC', 'Ticker', 'Coupon', 'Vintage', 'Month_Assessed'], columns="Month_key", values="CPR").rename_axis(columns=None).reset_index()
将其转换为:
SRC Ticker Coupon Vintage Month_Assessed M M+1 M+2 M-1 M-2
0 JPM FNCI 1.5 2020 Apr 7.23 5.13 7.53 8.9 8.9
1 JPM FNCI 1.5 2020 Aug 15.16 14.92 11.97 24.9 24.9
2 JPM FNCI 1.5 2020 Dec 11.58 14.51 19.00 5.0 5.0
3 JPM FNCI 1.5 2020 Feb 6.70 4.18 9.84 6.6 8.8
4 JPM FNCI 1.5 2020 Jan 4.29 10.19 12.88 6.6 5.0
问题是我要包括所有中间列(BAL($ bn)
通过refi%
),但是当我将其添加到index =
in
jpm_2021.pivot()
中时,它添加了许多行,M - > M-2列开始具有许多nan
值。这是因为这些列的值在整个数据范围内略有变化。有什么想法,我如何在不生成所有NAN的情况下将这些列添加到枢纽的数据框架中?例如,如果我确实包括bal($ bn)
,则看起来像这样:
SRC Ticker Coupon Vintage Month_Assessed Bal ($bn) M M+1 M+2 M-1 M-2
0 JPM FNCI 1.5 2020 Apr 27.2 NaN NaN NaN NaN 8.9
1 JPM FNCI 1.5 2020 Apr 27.6 NaN NaN NaN 8.9 NaN
2 JPM FNCI 1.5 2020 Apr 28 7.23 NaN NaN NaN NaN
3 JPM FNCI 1.5 2020 Apr 28.7 NaN 5.13 7.53 NaN NaN
4 JPM FNCI 1.5 2020 Aug 24.9 NaN NaN NaN NaN 24.9
... ... ... ... ... ... ... ... ... ... ... ...
7069 JPM G2SF 5.5 2008 May 1.2 24 21 21 24.3 NaN
7070 JPM G2SF 5.5 2008 Nov 1.1 23 21 20 23.2 NaN
7071 JPM G2SF 5.5 2008 Nov 1.3 NaN NaN NaN NaN 21.9
7072 JPM G2SF 5.5 2008 Oct 1.1 21 20 23 24 25 NaN
7073 JPM G2SF 5.5 2008 Sep 1.1 21 24 25 22 22 23
I have this dataframe jpm_2021
:
SRC SRCDate Ticker Coupon Vintage Bal WAC WAM WALA LNSZ LTV FICO Refi% Month_Assessed CPR Month_key
894 JPM 02/05/2021 FNCI 1.5 2020 28.7 2.25 175 4 293 / 286 60 777 91 Apr 7.536801 M+2
1528 JPM 03/05/2021 FNCI 1.5 2020 28.7 2.25 175 4 293 / 286 60 777 91 Apr 5.131145 M+1
2162 JPM 04/07/2021 FNCI 1.5 2020 28.0 2.25 173 6 292 / 281 60 777 91 Apr 7.233214 M
2796 JPM 05/07/2021 FNCI 1.5 2020 27.6 2.25 171 7 292 / 279 60 777 91 Apr 8.900000 M-1
3430 JPM 06/07/2021 FNCI 1.5 2020 27.2 2.25 170 8 292 / 277 60 777 91 Apr 8.900000 M-2
And use pandas pivot()
function as below:
jpm_final = jpm_2021.pivot(index=['SRC', 'Ticker', 'Coupon', 'Vintage', 'Month_Assessed'], columns="Month_key", values="CPR").rename_axis(columns=None).reset_index()
To convert it to:
SRC Ticker Coupon Vintage Month_Assessed M M+1 M+2 M-1 M-2
0 JPM FNCI 1.5 2020 Apr 7.23 5.13 7.53 8.9 8.9
1 JPM FNCI 1.5 2020 Aug 15.16 14.92 11.97 24.9 24.9
2 JPM FNCI 1.5 2020 Dec 11.58 14.51 19.00 5.0 5.0
3 JPM FNCI 1.5 2020 Feb 6.70 4.18 9.84 6.6 8.8
4 JPM FNCI 1.5 2020 Jan 4.29 10.19 12.88 6.6 5.0
The problem is I want to include all the middle columns (Bal ($bn)
through Refi%
) but when I add those to the index=
call in jpm_2021.pivot()
, it adds many rows and the M-->M-2 columns start to have many NaN
values. This is because the value of those columns changes slightly throughout the dataframe. Any ideas how I can add those columns to the pivoted dataframe without generating all the NaNs? If I do include for example, Bal ($bn)
, it looks like this:
SRC Ticker Coupon Vintage Month_Assessed Bal ($bn) M M+1 M+2 M-1 M-2
0 JPM FNCI 1.5 2020 Apr 27.2 NaN NaN NaN NaN 8.9
1 JPM FNCI 1.5 2020 Apr 27.6 NaN NaN NaN 8.9 NaN
2 JPM FNCI 1.5 2020 Apr 28 7.23 NaN NaN NaN NaN
3 JPM FNCI 1.5 2020 Apr 28.7 NaN 5.13 7.53 NaN NaN
4 JPM FNCI 1.5 2020 Aug 24.9 NaN NaN NaN NaN 24.9
... ... ... ... ... ... ... ... ... ... ... ...
7069 JPM G2SF 5.5 2008 May 1.2 24 21 21 24.3 NaN
7070 JPM G2SF 5.5 2008 Nov 1.1 23 21 20 23.2 NaN
7071 JPM G2SF 5.5 2008 Nov 1.3 NaN NaN NaN NaN 21.9
7072 JPM G2SF 5.5 2008 Oct 1.1 21 20 23 24 25 NaN
7073 JPM G2SF 5.5 2008 Sep 1.1 21 24 25 22 22 23
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
您要的东西没有多大意义。如果您旋转数据框,并制作其索引列,而索引列对于每个枢纽值都不会带有伴随的值...则可以预期拥有NAN值。
您可以使它变得非常宽泛:
或者可以解开具有有趣的索引:
但是,除非您更新那些略有不同以至于相同的列,否则它将无法神奇地为您做到这一点并从中创建一些东西没有什么。
一种方法可能用平均值替代的方法:
输出:
What you're asking for doesn't make much sense. If you pivot a dataframe, and make its index columns that won't have an accompanying value for every pivoted value... then having NaN values is to be expected.
You could make it very wide like:
Or maybe unstack that to have an interesting index:
But unless you update those columns that are slightly different to be the same, it's not going to magically be able to do that for you and create something out of nothing.
One way to approach that might be replacing with the average:
Output: