许多 FunctionTransformer 到同一列 - sklearn
我只有一个输入,即用户的电子邮件,我使用
X = np.array(['[email protected]', '[email protected]'])
y = np.array([True, False])
def email_length(email) -> np.array:
return [len(e.split('@')[0]) for e in email]
def domain_length(email) -> np.array:
return [len(e.split('@')[-1]) for e in email]
def number_of_vouls(email) -> np.array:
vouls = 'aeiouAEIOU'
name = [e.split('@')[0] for e in email]
return [sum(1 for char in name if char in vouls) for name in name]
/modules/ generated/sklearn.preprocessing.FunctionTransformer.html#sklearn.preprocessing.FunctionTransformer" rel="nofollow noreferrer">sklearn,创建后的 我将其打包在 FunctionTransformers 中的函数
email_length1 = FunctionTransformer(email_length)
domain_length1 = FunctionTransformer(domain_length)
number_of_vouls1 = FunctionTransformer(number_of_vouls)
然后创建管道
pipe = Pipeline([
('email_length', email_length1),
('domain_length', domain_length1),
('number_of_vouls', number_of_vouls1),
('classifier', LGBMClassifier())
])
但是当我尝试适应模型时,就像
pipe.fit(X, y)
我有 AttributeError: 'int' object has no attribute 'split'但每当我这样做
domain_length(X)
Output : [9, 9]
I have only one input, which is email of a user and i create many different functions to create features from the email using FunctionTransformers
from sklearn, example
X = np.array(['[email protected]', '[email protected]'])
y = np.array([True, False])
def email_length(email) -> np.array:
return [len(e.split('@')[0]) for e in email]
def domain_length(email) -> np.array:
return [len(e.split('@')[-1]) for e in email]
def number_of_vouls(email) -> np.array:
vouls = 'aeiouAEIOU'
name = [e.split('@')[0] for e in email]
return [sum(1 for char in name if char in vouls) for name in name]
after creating the functions i pack it in the FunctionTransformers
email_length1 = FunctionTransformer(email_length)
domain_length1 = FunctionTransformer(domain_length)
number_of_vouls1 = FunctionTransformer(number_of_vouls)
Then i create the Pipeline
pipe = Pipeline([
('email_length', email_length1),
('domain_length', domain_length1),
('number_of_vouls', number_of_vouls1),
('classifier', LGBMClassifier())
])
But when i try to fit the model like
pipe.fit(X, y)
I have AttributeError: 'int' object has no attribute 'split'
but whenever i do
domain_length(X)
Output : [9, 9]
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

管道步骤按顺序应用,因此您的第二个转换器接收电子邮件长度而不是电子邮件地址。
您可以在此处使用
ColumnTransformer
或FeatureUnion
。例如,由于函数中返回的形状,您将收到一个新错误,但将它们包装到 numpy 数组并对其进行整形似乎可以工作:
Pipeline steps are applied sequentially, so your second transformer is receiving the email lengths rather than the email addresses.
You can use a
ColumnTransformer
orFeatureUnion
here. For example,You'll get a new error because of the shape of the returns in your functions, but wrapping those up to numpy arrays and shaping them appears to work: