在 Python Pandas Dataframe 中创建二值化行

发布于 2025-01-09 02:22:39 字数 1362 浏览 0 评论 0原文

假设数据帧 df 是

d = { 'Title': ['Elden Ring', 'Starcraft 2', 'Terraforming Mars'], 'Genre' : [ 'Fantasy;Videogame', 'Videogame', 'Fantasy;Boardgame'] }
pd.DataFrame(data=d, index=None)

这样的,

Elden Ring          Fantasy;Videogame
Starcraft 2         Videogame
Terraforming Mars   Fantasy;Boardgame

我的目标是以如下所示的数据帧结束:

Title               Genres                 Fantasy     Videogame   Boardgame
Elden Ring          [Fantasy, Videogame]      1            1            0
Starcraft 2         [Videogame]              0            1            0
Terraforming Mars   [Fantasy, Boardgame]      1            0            1

最好的方法是什么?我尝试这样做

from sklearn.preprocessing import MultiLabelBinarizer
df = pd.DataFrame(data=d, index=None)
df.Genre = df.Genre.str.split(';')
binar = MultiLabelBinarizer()
genre_labels = binar.fit_transform( df.Genre )
df[ binar.classes_ ] = genre_labels

这给了我一个数据框:

Title             Genre                 Boardgame   Fantasy     Videogame
Elden Ring        [Fantasy, Videogame]  0             1             1
Starcraft 2       [Videogame]           0             0             1
Terraforming Mars [Fantasy, Boardgame]  1             1             0

这给了我我想要的东西,但感觉做起来很复杂。有没有更干净的方法来做到这一点?

Suppose data frame df is

d = { 'Title': ['Elden Ring', 'Starcraft 2', 'Terraforming Mars'], 'Genre' : [ 'Fantasy;Videogame', 'Videogame', 'Fantasy;Boardgame'] }
pd.DataFrame(data=d, index=None)

Such that it's

Elden Ring          Fantasy;Videogame
Starcraft 2         Videogame
Terraforming Mars   Fantasy;Boardgame

My goal is to end with a dataframe that looks like this:

Title               Genres                 Fantasy     Videogame   Boardgame
Elden Ring          [Fantasy, Videogame]      1            1            0
Starcraft 2         [Videogame]              0            1            0
Terraforming Mars   [Fantasy, Boardgame]      1            0            1

How is the best way to go about this? I tried doing

from sklearn.preprocessing import MultiLabelBinarizer
df = pd.DataFrame(data=d, index=None)
df.Genre = df.Genre.str.split(';')
binar = MultiLabelBinarizer()
genre_labels = binar.fit_transform( df.Genre )
df[ binar.classes_ ] = genre_labels

This gives me a dataframe:

Title             Genre                 Boardgame   Fantasy     Videogame
Elden Ring        [Fantasy, Videogame]  0             1             1
Starcraft 2       [Videogame]           0             0             1
Terraforming Mars [Fantasy, Boardgame]  1             1             0

This gives me what I want but it felt convoluted to do. Is there a cleaner way to be doing this?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

鲜肉鲜肉永远不皱 2025-01-16 02:22:39

或者使用 Series.str.get_dummies

df.Genre.str.strip('[]').str.get_dummies(sep=', ')
   Boardgame  Fantasy  Videogame
0          0        1          1
1          0        0          1
2          1        1          0

附加到数据帧:

pd.concat([df, df.Genre.str.strip('[]').str.get_dummies(sep=', ')], axis=1)

               Title                 Genre  Boardgame  Fantasy  Videogame
0         Elden Ring  [Fantasy, Videogame]          0        1          1
1        Starcraft 2           [Videogame]          0        0          1
2  Terraforming Mars  [Fantasy, Boardgame]          1        1          0

如果 Genre 作为列表类型启动:

df.Genre = df.Genre.str.join(';')
pd.concat([df, df.Genre.str.get_dummies(sep=';')], axis=1)

               Title              Genre  Boardgame  Fantasy  Videogame
0         Elden Ring  Fantasy;Videogame          0        1          1
1        Starcraft 2          Videogame          0        0          1
2  Terraforming Mars  Fantasy;Boardgame          1        1          0

Or use Series.str.get_dummies:

df.Genre.str.strip('[]').str.get_dummies(sep=', ')
   Boardgame  Fantasy  Videogame
0          0        1          1
1          0        0          1
2          1        1          0

To append to dataframe:

pd.concat([df, df.Genre.str.strip('[]').str.get_dummies(sep=', ')], axis=1)

               Title                 Genre  Boardgame  Fantasy  Videogame
0         Elden Ring  [Fantasy, Videogame]          0        1          1
1        Starcraft 2           [Videogame]          0        0          1
2  Terraforming Mars  [Fantasy, Boardgame]          1        1          0

If Genre is started as list type:

df.Genre = df.Genre.str.join(';')
pd.concat([df, df.Genre.str.get_dummies(sep=';')], axis=1)

               Title              Genre  Boardgame  Fantasy  Videogame
0         Elden Ring  Fantasy;Videogame          0        1          1
1        Starcraft 2          Videogame          0        0          1
2  Terraforming Mars  Fantasy;Boardgame          1        1          0
夏末的微笑 2025-01-16 02:22:39

<代码>。 str.get_dummies 是专门为此设计的:

df = pd.concat([df, df['Genre'].str.get_dummies(';')], axis=1)

输出:

>>> df
               Title              Genre  Boardgame  Fantasy  Videogame
0         Elden Ring  Fantasy;Videogame          0        1          1
1        Starcraft 2          Videogame          0        0          1
2  Terraforming Mars  Fantasy;Boardgame          1        1          0

.str.get_dummies was designed specifically for this:

df = pd.concat([df, df['Genre'].str.get_dummies(';')], axis=1)

Output:

>>> df
               Title              Genre  Boardgame  Fantasy  Videogame
0         Elden Ring  Fantasy;Videogame          0        1          1
1        Starcraft 2          Videogame          0        0          1
2  Terraforming Mars  Fantasy;Boardgame          1        1          0
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文