需要从一个热编码形状检索原始数据形状

发布于 2025-02-10 08:35:35 字数 3468 浏览 1 评论 0原文

我收到了一个数据集,其中包括先前转换为一个热编码的列。我想检索其旧形状以进行一些预处理和填充NAS方法,当然也可以阅读数据集的统计模型。

我得到的数据列:

team2_offistion_derived_var_0team2_offistion_derived_derived_var_1team2_offistion_derived_derived_var_var_2team2_offence_derives_derived_derived_var_3team2_offistion_derived_derives_var_var_4team2_offensive_derived_var_9team2_offensive_derives_var_10team2000 0 0 0 0 0 0
0 00 00 00000 00 0000
0000000 0 0 0 0 00 0000
0 0 0000000100 00
000000 010000
00000000000
0000 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 00000000
0000 0000000 00
0 0 000000 000000
0000000000
00000 00 00 0 0000 00 0 0
0 00 0 0 0000000000

想将其形状转换为

row_idteam2_offissives_ferived
0var 10 var 10
1var 9
2var 8
3var 7
4var 6
5var 5
6var 4
7var 3
8var 2
9var 1
10var 0

列,例如:

team2_other_ratio_var_42team2 team2_other_ratio_var_43_other_ratio_var_48team2_other_ratio_var_49team2_other_ratio_var_50team2_other_ratio_var_51team2_other_ratio_var_52i还0.0一个0.0
获得了0.4000.2000.0000.7500.2500.3411210.3750.3541670.1842110.000

,但我很困惑如何将其检索到原始形状? “分类”,但我不知道如何?

谢谢大家的帮助

I received a dataset that include columns that previously transformed into one hot encoded. And I want to retrieve the old shape of it to do some preprocessing and filling NAs methods and of course read the stats model of the dataset.

The data columns I got:

team2_offensive_derived_var_0team2_offensive_derived_var_1team2_offensive_derived_var_2team2_offensive_derived_var_3team2_offensive_derived_var_4team2_offensive_derived_var_5team2_offensive_derived_var_6team2_offensive_derived_var_7team2_offensive_derived_var_8team2_offensive_derived_var_9team2_offensive_derived_var_10
00000000001
00000000010
00000000100
00000001000
00000010000
00000100000
00001000000
00010000000
00100000000
01000000000
10000000000

I want to transform it's shape into

row_idteam2_offensive_derived
0var 10
1var 9
2var 8
3var 7
4var 6
5var 5
6var 4
7var 3
8var 2
9var 1
10var 0

I also got a columns like:

team2_other_ratio_var_42team2_other_ratio_var_43team2_other_ratio_var_44team2_other_ratio_var_45team2_other_ratio_var_46team2_other_ratio_var_47team2_other_ratio_var_48team2_other_ratio_var_49team2_other_ratio_var_50team2_other_ratio_var_51team2_other_ratio_var_52
0.00.4000.2000.0000.7500.2500.3411210.3750.3541670.1842110.000

But I'm confused how should I retrieve it to it's original shape? "Categorical" But I don't know how?

Thank you all for your help

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

剩一世无双 2025-02-17 08:35:35

您可以使用stack

cols = ['row_id', 'team2_offensive_derived']
out = df.replace(0, pd.NA).stack().rename_axis(cols).reset_index()[cols]

输出:

    row_id         team2_offensive_derived
0        0  team2_offensive_derived_var_10
1        1   team2_offensive_derived_var_9
2        2   team2_offensive_derived_var_8
3        3   team2_offensive_derived_var_7
4        4   team2_offensive_derived_var_6
5        5   team2_offensive_derived_var_5
6        6   team2_offensive_derived_var_4
7        7   team2_offensive_derived_var_3
8        8   team2_offensive_derived_var_2
9        9   team2_offensive_derived_var_1
10      10   team2_offensive_derived_var_0

具有不同的列名称:

out = (df
       .replace(0, pd.NA)
       .rename(columns=lambda x: x.replace('team2_offensive_derived_', ''))
       .stack()
       .rename_axis(cols)
       .reset_index()[cols]
      )

输出:

    row_id team2_offensive_derived
0        0                  var_10
1        1                   var_9
2        2                   var_8
3        3                   var_7
4        4                   var_6
5        5                   var_5
6        6                   var_4
7        7                   var_3
8        8                   var_2
9        9                   var_1
10      10                   var_0

You can use a stack:

cols = ['row_id', 'team2_offensive_derived']
out = df.replace(0, pd.NA).stack().rename_axis(cols).reset_index()[cols]

output:

    row_id         team2_offensive_derived
0        0  team2_offensive_derived_var_10
1        1   team2_offensive_derived_var_9
2        2   team2_offensive_derived_var_8
3        3   team2_offensive_derived_var_7
4        4   team2_offensive_derived_var_6
5        5   team2_offensive_derived_var_5
6        6   team2_offensive_derived_var_4
7        7   team2_offensive_derived_var_3
8        8   team2_offensive_derived_var_2
9        9   team2_offensive_derived_var_1
10      10   team2_offensive_derived_var_0

With different column names:

out = (df
       .replace(0, pd.NA)
       .rename(columns=lambda x: x.replace('team2_offensive_derived_', ''))
       .stack()
       .rename_axis(cols)
       .reset_index()[cols]
      )

output:

    row_id team2_offensive_derived
0        0                  var_10
1        1                   var_9
2        2                   var_8
3        3                   var_7
4        4                   var_6
5        5                   var_5
6        6                   var_4
7        7                   var_3
8        8                   var_2
9        9                   var_1
10      10                   var_0
半岛未凉 2025-02-17 08:35:35

使用:

#split by last previous _
df.columns = df.columns.str.rsplit('_', 2, expand=True)
#replace 0 to NaNs, so reshape remove rows with 0
df = df.replace(0, np.nan).stack([1,2])
#join MultiIndex value
df.index = df.index.map(lambda x: f'{x[1]} {x[2]}')
#create DataFrame
df = df.index.to_frame(name='team2_offensive_derived',index=False)

print (df)
   team2_offensive_derived
0                   var 10
1                    var 9
2                    var 8
3                    var 7
4                    var 6
5                    var 5
6                    var 4
7                    var 3
8                    var 2
9                    var 1
10                   var 0

Use:

#split by last previous _
df.columns = df.columns.str.rsplit('_', 2, expand=True)
#replace 0 to NaNs, so reshape remove rows with 0
df = df.replace(0, np.nan).stack([1,2])
#join MultiIndex value
df.index = df.index.map(lambda x: f'{x[1]} {x[2]}')
#create DataFrame
df = df.index.to_frame(name='team2_offensive_derived',index=False)

print (df)
   team2_offensive_derived
0                   var 10
1                    var 9
2                    var 8
3                    var 7
4                    var 6
5                    var 5
6                    var 4
7                    var 3
8                    var 2
9                    var 1
10                   var 0
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文