当前位置：文江博客话题详情

Python pandas dataframe multilabel-classification

在 Python Pandas Dataframe 中创建二值化行

发布于 2025-01-09 02:22:39 字数 1362 浏览 0 评论 0原文

假设数据帧 df 是

d = { 'Title': ['Elden Ring', 'Starcraft 2', 'Terraforming Mars'], 'Genre' : [ 'Fantasy;Videogame', 'Videogame', 'Fantasy;Boardgame'] }
pd.DataFrame(data=d, index=None)

这样的，

Elden Ring          Fantasy;Videogame
Starcraft 2         Videogame
Terraforming Mars   Fantasy;Boardgame

我的目标是以如下所示的数据帧结束：

Title               Genres                 Fantasy     Videogame   Boardgame
Elden Ring          [Fantasy, Videogame]      1            1            0
Starcraft 2         [Videogame]              0            1            0
Terraforming Mars   [Fantasy, Boardgame]      1            0            1

最好的方法是什么？我尝试这样做

from sklearn.preprocessing import MultiLabelBinarizer
df = pd.DataFrame(data=d, index=None)
df.Genre = df.Genre.str.split(';')
binar = MultiLabelBinarizer()
genre_labels = binar.fit_transform( df.Genre )
df[ binar.classes_ ] = genre_labels

这给了我一个数据框：

Title             Genre                 Boardgame   Fantasy     Videogame
Elden Ring        [Fantasy, Videogame]  0             1             1
Starcraft 2       [Videogame]           0             0             1
Terraforming Mars [Fantasy, Boardgame]  1             1             0

这给了我我想要的东西，但感觉做起来很复杂。有没有更干净的方法来做到这一点？

Suppose data frame df is

d = { 'Title': ['Elden Ring', 'Starcraft 2', 'Terraforming Mars'], 'Genre' : [ 'Fantasy;Videogame', 'Videogame', 'Fantasy;Boardgame'] }
pd.DataFrame(data=d, index=None)

Such that it's

Elden Ring          Fantasy;Videogame
Starcraft 2         Videogame
Terraforming Mars   Fantasy;Boardgame

My goal is to end with a dataframe that looks like this:

Title               Genres                 Fantasy     Videogame   Boardgame
Elden Ring          [Fantasy, Videogame]      1            1            0
Starcraft 2         [Videogame]              0            1            0
Terraforming Mars   [Fantasy, Boardgame]      1            0            1

How is the best way to go about this? I tried doing

from sklearn.preprocessing import MultiLabelBinarizer
df = pd.DataFrame(data=d, index=None)
df.Genre = df.Genre.str.split(';')
binar = MultiLabelBinarizer()
genre_labels = binar.fit_transform( df.Genre )
df[ binar.classes_ ] = genre_labels

This gives me a dataframe:

Title             Genre                 Boardgame   Fantasy     Videogame
Elden Ring        [Fantasy, Videogame]  0             1             1
Starcraft 2       [Videogame]           0             0             1
Terraforming Mars [Fantasy, Boardgame]  1             1             0

This gives me what I want but it felt convoluted to do. Is there a cleaner way to be doing this?

收藏 0

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

评论（2）

鲜肉鲜肉永远不皱 2025-01-16 02:22:39

或者使用 Series.str.get_dummies：

df.Genre.str.strip('[]').str.get_dummies(sep=', ')
   Boardgame  Fantasy  Videogame
0          0        1          1
1          0        0          1
2          1        1          0

附加到数据帧：

pd.concat([df, df.Genre.str.strip('[]').str.get_dummies(sep=', ')], axis=1)

               Title                 Genre  Boardgame  Fantasy  Videogame
0         Elden Ring  [Fantasy, Videogame]          0        1          1
1        Starcraft 2           [Videogame]          0        0          1
2  Terraforming Mars  [Fantasy, Boardgame]          1        1          0

如果 Genre 作为列表类型启动：

df.Genre = df.Genre.str.join(';')
pd.concat([df, df.Genre.str.get_dummies(sep=';')], axis=1)

               Title              Genre  Boardgame  Fantasy  Videogame
0         Elden Ring  Fantasy;Videogame          0        1          1
1        Starcraft 2          Videogame          0        0          1
2  Terraforming Mars  Fantasy;Boardgame          1        1          0

Or use Series.str.get_dummies:

df.Genre.str.strip('[]').str.get_dummies(sep=', ')
   Boardgame  Fantasy  Videogame
0          0        1          1
1          0        0          1
2          1        1          0

To append to dataframe:

pd.concat([df, df.Genre.str.strip('[]').str.get_dummies(sep=', ')], axis=1)

               Title                 Genre  Boardgame  Fantasy  Videogame
0         Elden Ring  [Fantasy, Videogame]          0        1          1
1        Starcraft 2           [Videogame]          0        0          1
2  Terraforming Mars  [Fantasy, Boardgame]          1        1          0

If Genre is started as list type:

df.Genre = df.Genre.str.join(';')
pd.concat([df, df.Genre.str.get_dummies(sep=';')], axis=1)

               Title              Genre  Boardgame  Fantasy  Videogame
0         Elden Ring  Fantasy;Videogame          0        1          1
1        Starcraft 2          Videogame          0        0          1
2  Terraforming Mars  Fantasy;Boardgame          1        1          0

回复收藏 0 原文

夏末的微笑 2025-01-16 02:22:39

<代码>。 str.get_dummies 是专门为此设计的：

df = pd.concat([df, df['Genre'].str.get_dummies(';')], axis=1)

输出：

>>> df
               Title              Genre  Boardgame  Fantasy  Videogame
0         Elden Ring  Fantasy;Videogame          0        1          1
1        Starcraft 2          Videogame          0        0          1
2  Terraforming Mars  Fantasy;Boardgame          1        1          0

.str.get_dummies was designed specifically for this:

df = pd.concat([df, df['Genre'].str.get_dummies(';')], axis=1)

Output:

>>> df
               Title              Genre  Boardgame  Fantasy  Videogame
0         Elden Ring  Fantasy;Videogame          0        1          1
1        Starcraft 2          Videogame          0        0          1
2  Terraforming Mars  Fantasy;Boardgame          1        1          0

回复收藏 0 原文

~没有更多了~

关于作者

暂无简介

文章

评论

25 人气

关注发私信

相关话题

热门标签

操作系统程序设计 IT运维 Linux系统管理 JavaScript 服务器应用 solaris C/C++ PHP Shell BSD Vue.js aix Oracle Python HTML 系统管理 HTML5 CSS 前端

推荐作者

忆悲凉

文章 0 评论 0

hgfg1645

文章 0 评论 0

qq_qLPLYi

文章 0 评论 0

戏舞

文章 0 评论 0

殊姿

文章 0 评论 0

﹂绝世的画

文章 0 评论 0

友情链接

我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的隐私政策了解更多相关信息。单击 接受 或继续使用网站，即表示您同意使用 Cookies 和您的相关数据。

原文