标签编码列组合的排列

发布于 2025-01-22 17:05:10 字数 704 浏览 0 评论 0原文

我想使用sklearn's labeLencoder()创建两个列的排列类标签。我如何实现以下行为?

import pandas as pd
import numpy as np
from sklearn.preprocessing import LabelEncoder

df = pd.read_csv("data.csv", sep=",")
df
#    A    B    
# 0  1  Yes 
# 1  2   No 
# 2  3  Yes 
# 3  4  Yes

我想置于A& amp; B而不是分别编码这两列:

df['A'].astype('category')
#Categories (4, int64): [1, 2, 3, 4, ]

df['B'].astype('category')
#Categories (2, object): ['Yes','No']

#Column C should have 4 * 2 classes:
(1,Yes)=1  (1,No)=5
(2,Yes)=2  (2,No)=6
(3,Yes)=3  (3,No)=7
(4,Yes)=4  (4,No)=8

#Newdf
#    A    B  C    
# 0  1  Yes  1
# 1  2   No  6
# 2  3  Yes  3
# 3  4  Yes  4

I'd like to create class labels for a permutation of two columns using sklearn's LabelEncoder(). How do I achieve the following behavior?

import pandas as pd
import numpy as np
from sklearn.preprocessing import LabelEncoder

df = pd.read_csv("data.csv", sep=",")
df
#    A    B    
# 0  1  Yes 
# 1  2   No 
# 2  3  Yes 
# 3  4  Yes

I'd like to have the permutation of combination of A && B rather than encoding these two columns separately:

df['A'].astype('category')
#Categories (4, int64): [1, 2, 3, 4, ]

df['B'].astype('category')
#Categories (2, object): ['Yes','No']

#Column C should have 4 * 2 classes:
(1,Yes)=1  (1,No)=5
(2,Yes)=2  (2,No)=6
(3,Yes)=3  (3,No)=7
(4,Yes)=4  (4,No)=8

#Newdf
#    A    B  C    
# 0  1  Yes  1
# 1  2   No  6
# 2  3  Yes  3
# 3  4  Yes  4

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

蓝海 2025-01-29 17:05:10

我们可以使用Cross 合并创建映射DF

out = df.merge(df[['B']].drop_duplicates().merge(df['A'].drop_duplicates(),how='cross').assign(C=lambda x : x.index+1))
Out[415]: 
   A    B  C
0  1  Yes  1
1  2   No  6
2  3  Yes  3
3  4  Yes  4

更多信息

df[['B']].drop_duplicates().merge(df['A'].drop_duplicates(),how='cross').assign(C=lambda x : x.index+1)
Out[417]: 
     B  A  C
0  Yes  1  1
1  Yes  2  2
2  Yes  3  3
3  Yes  4  4
4   No  1  5
5   No  2  6
6   No  3  7
7   No  4  8

We can create the mapping df with cross merge

out = df.merge(df[['B']].drop_duplicates().merge(df['A'].drop_duplicates(),how='cross').assign(C=lambda x : x.index+1))
Out[415]: 
   A    B  C
0  1  Yes  1
1  2   No  6
2  3  Yes  3
3  4  Yes  4

More info

df[['B']].drop_duplicates().merge(df['A'].drop_duplicates(),how='cross').assign(C=lambda x : x.index+1)
Out[417]: 
     B  A  C
0  Yes  1  1
1  Yes  2  2
2  Yes  3  3
3  Yes  4  4
4   No  1  5
5   No  2  6
6   No  3  7
7   No  4  8
ぃ弥猫深巷。 2025-01-29 17:05:10

您可以创建来自2列的其他列合并值,为一个元组。但是labElencoder无法编码元组,因此您需要获得元组的hash()

df['AB'] = df.apply(lambda row: hash((row['A'], row['B'])), axis=1)
le = LabelEncoder()
df['C'] = le.fit_transform(df['AB'])

但是,如果要保留确切的标签订单(您指定的) ,使用labElencoder()是没有意义的。您可以简单地计算c列如下:

df['C'] = df['A'] + (df['B']=='No') * df['A'].max()

output:

    A   B   C
0   1   Yes 1
1   2   No  6
2   3   Yes 3
3   4   Yes 4

编辑:

如果要保留标签以进行错过的组合(例如(2,'YES'))并且需要用于任意数量类的解决方案,您可以使用2 labelencoder()

leA = LabelEncoder()
leB = LabelEncoder()
leA.fit(df['A'])
leB.fit(df['B'])
df['C'] = leA.transform(df['A']) + leA.classes_.size
leB.transform(df['B']) + 1 # if you want labels to start from 1

但是在这种情况下,您无法保留自定义订单,标签列表将自动排序,例如[1,2,3,4]和['no','是']。

输出:

    A   B   C
0   1   Yes 5
1   2   No  2
2   3   Yes 7
3   4   Yes 8

You can create additional column merging values from 2 columns into one tuple. But LabelEncoder cannot encode the tuples, so additionally you need to get hash() of the tuple:

df['AB'] = df.apply(lambda row: hash((row['A'], row['B'])), axis=1)
le = LabelEncoder()
df['C'] = le.fit_transform(df['AB'])

However, if you want to preserve the exact labels order (that you specified), using LabelEncoder() doesn't make sense. You can simply compute the C column like that:

df['C'] = df['A'] + (df['B']=='No') * df['A'].max()

Output:

    A   B   C
0   1   Yes 1
1   2   No  6
2   3   Yes 3
3   4   Yes 4

EDIT:

If you want to keep the labels for missed combinations (e.g. (2, 'Yes')) and need a solution for arbitrary number of classes, you can use 2 LabelEncoder():

leA = LabelEncoder()
leB = LabelEncoder()
leA.fit(df['A'])
leB.fit(df['B'])
df['C'] = leA.transform(df['A']) + leA.classes_.size
leB.transform(df['B']) + 1 # if you want labels to start from 1

But in this case you cannot preserve the custom order, the list of labels will be automatically sorted, e.g. [1,2,3,4] and ['No','Yes'].

Output:

    A   B   C
0   1   Yes 5
1   2   No  2
2   3   Yes 7
3   4   Yes 8
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文