在 Python Pandas Dataframe 中创建二值化行
假设数据帧 df 是
d = { 'Title': ['Elden Ring', 'Starcraft 2', 'Terraforming Mars'], 'Genre' : [ 'Fantasy;Videogame', 'Videogame', 'Fantasy;Boardgame'] }
pd.DataFrame(data=d, index=None)
这样的,
Elden Ring Fantasy;Videogame
Starcraft 2 Videogame
Terraforming Mars Fantasy;Boardgame
我的目标是以如下所示的数据帧结束:
Title Genres Fantasy Videogame Boardgame
Elden Ring [Fantasy, Videogame] 1 1 0
Starcraft 2 [Videogame] 0 1 0
Terraforming Mars [Fantasy, Boardgame] 1 0 1
最好的方法是什么?我尝试这样做
from sklearn.preprocessing import MultiLabelBinarizer
df = pd.DataFrame(data=d, index=None)
df.Genre = df.Genre.str.split(';')
binar = MultiLabelBinarizer()
genre_labels = binar.fit_transform( df.Genre )
df[ binar.classes_ ] = genre_labels
这给了我一个数据框:
Title Genre Boardgame Fantasy Videogame
Elden Ring [Fantasy, Videogame] 0 1 1
Starcraft 2 [Videogame] 0 0 1
Terraforming Mars [Fantasy, Boardgame] 1 1 0
这给了我我想要的东西,但感觉做起来很复杂。有没有更干净的方法来做到这一点?
Suppose data frame df
is
d = { 'Title': ['Elden Ring', 'Starcraft 2', 'Terraforming Mars'], 'Genre' : [ 'Fantasy;Videogame', 'Videogame', 'Fantasy;Boardgame'] }
pd.DataFrame(data=d, index=None)
Such that it's
Elden Ring Fantasy;Videogame
Starcraft 2 Videogame
Terraforming Mars Fantasy;Boardgame
My goal is to end with a dataframe that looks like this:
Title Genres Fantasy Videogame Boardgame
Elden Ring [Fantasy, Videogame] 1 1 0
Starcraft 2 [Videogame] 0 1 0
Terraforming Mars [Fantasy, Boardgame] 1 0 1
How is the best way to go about this? I tried doing
from sklearn.preprocessing import MultiLabelBinarizer
df = pd.DataFrame(data=d, index=None)
df.Genre = df.Genre.str.split(';')
binar = MultiLabelBinarizer()
genre_labels = binar.fit_transform( df.Genre )
df[ binar.classes_ ] = genre_labels
This gives me a dataframe:
Title Genre Boardgame Fantasy Videogame
Elden Ring [Fantasy, Videogame] 0 1 1
Starcraft 2 [Videogame] 0 0 1
Terraforming Mars [Fantasy, Boardgame] 1 1 0
This gives me what I want but it felt convoluted to do. Is there a cleaner way to be doing this?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
或者使用
Series.str.get_dummies
:附加到数据帧:
如果
Genre
作为列表类型启动:Or use
Series.str.get_dummies
:To append to dataframe:
If
Genre
is started as list type:<代码>。 str.get_dummies 是专门为此设计的:
输出:
.str.get_dummies
was designed specifically for this:Output: