标签编码列组合的排列
我想使用sklearn
's labeLencoder()
创建两个列的排列类标签。我如何实现以下行为?
import pandas as pd
import numpy as np
from sklearn.preprocessing import LabelEncoder
df = pd.read_csv("data.csv", sep=",")
df
# A B
# 0 1 Yes
# 1 2 No
# 2 3 Yes
# 3 4 Yes
我想置于A& amp; B而不是分别编码这两列:
df['A'].astype('category')
#Categories (4, int64): [1, 2, 3, 4, ]
df['B'].astype('category')
#Categories (2, object): ['Yes','No']
#Column C should have 4 * 2 classes:
(1,Yes)=1 (1,No)=5
(2,Yes)=2 (2,No)=6
(3,Yes)=3 (3,No)=7
(4,Yes)=4 (4,No)=8
#Newdf
# A B C
# 0 1 Yes 1
# 1 2 No 6
# 2 3 Yes 3
# 3 4 Yes 4
I'd like to create class labels for a permutation of two columns using sklearn
's LabelEncoder()
. How do I achieve the following behavior?
import pandas as pd
import numpy as np
from sklearn.preprocessing import LabelEncoder
df = pd.read_csv("data.csv", sep=",")
df
# A B
# 0 1 Yes
# 1 2 No
# 2 3 Yes
# 3 4 Yes
I'd like to have the permutation of combination of A && B rather than encoding these two columns separately:
df['A'].astype('category')
#Categories (4, int64): [1, 2, 3, 4, ]
df['B'].astype('category')
#Categories (2, object): ['Yes','No']
#Column C should have 4 * 2 classes:
(1,Yes)=1 (1,No)=5
(2,Yes)=2 (2,No)=6
(3,Yes)=3 (3,No)=7
(4,Yes)=4 (4,No)=8
#Newdf
# A B C
# 0 1 Yes 1
# 1 2 No 6
# 2 3 Yes 3
# 3 4 Yes 4
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
我们可以使用Cross
合并
创建映射DF更多信息
We can create the mapping df with cross
merge
More info
您可以创建来自2列的其他列合并值,为一个元组。但是
labElencoder
无法编码元组,因此您需要获得元组的hash()
:但是,如果要保留确切的标签订单(您指定的) ,使用
labElencoder()
是没有意义的。您可以简单地计算c
列如下:output:
编辑:
如果要保留标签以进行错过的组合(例如
(2,'YES')
)并且需要用于任意数量类的解决方案,您可以使用2labelencoder()
:但是在这种情况下,您无法保留自定义订单,标签列表将自动排序,例如[1,2,3,4]和['no','是']。
输出:
You can create additional column merging values from 2 columns into one tuple. But
LabelEncoder
cannot encode the tuples, so additionally you need to gethash()
of the tuple:However, if you want to preserve the exact labels order (that you specified), using
LabelEncoder()
doesn't make sense. You can simply compute theC
column like that:Output:
EDIT:
If you want to keep the labels for missed combinations (e.g.
(2, 'Yes')
) and need a solution for arbitrary number of classes, you can use 2LabelEncoder()
:But in this case you cannot preserve the custom order, the list of labels will be automatically sorted, e.g. [1,2,3,4] and ['No','Yes'].
Output: