许多 FunctionTransformer 到同一列 - sklearn

发布于 01-19 06:55 字数 1547 浏览 2 评论 0原文

我只有一个输入，即用户的电子邮件，我使用

X = np.array(['[email protected]', '[email protected]'])
y = np.array([True, False])

def email_length(email) -> np.array:
    return [len(e.split('@')[0]) for e in email]

def domain_length(email) -> np.array:
    return [len(e.split('@')[-1]) for e in email]

def number_of_vouls(email) -> np.array:
    vouls = 'aeiouAEIOU'
    name = [e.split('@')[0] for e in email]
    return [sum(1 for char in name if char in vouls) for name in name]

/modules/ generated/sklearn.preprocessing.FunctionTransformer.html#sklearn.preprocessing.FunctionTransformer" rel="nofollow noreferrer">sklearn，创建后的我将其打包在 FunctionTransformers 中的函数

email_length1 = FunctionTransformer(email_length)
domain_length1 = FunctionTransformer(domain_length)
number_of_vouls1 = FunctionTransformer(number_of_vouls)

然后创建管道

pipe = Pipeline([
        ('email_length', email_length1),
        ('domain_length', domain_length1),
        ('number_of_vouls', number_of_vouls1),
        ('classifier', LGBMClassifier())
        ])

但是当我尝试适应模型时，就像

 pipe.fit(X, y)

我有 AttributeError: 'int' object has no attribute 'split'但每当我这样做

domain_length(X)
Output : [9, 9]

原文

I have only one input, which is email of a user and i create many different functions to create features from the email using FunctionTransformers from sklearn, example

X = np.array(['[email protected]', '[email protected]'])
y = np.array([True, False])

def email_length(email) -> np.array:
    return [len(e.split('@')[0]) for e in email]

def domain_length(email) -> np.array:
    return [len(e.split('@')[-1]) for e in email]

def number_of_vouls(email) -> np.array:
    vouls = 'aeiouAEIOU'
    name = [e.split('@')[0] for e in email]
    return [sum(1 for char in name if char in vouls) for name in name]

after creating the functions i pack it in the FunctionTransformers

email_length1 = FunctionTransformer(email_length)
domain_length1 = FunctionTransformer(domain_length)
number_of_vouls1 = FunctionTransformer(number_of_vouls)

Then i create the Pipeline

pipe = Pipeline([
        ('email_length', email_length1),
        ('domain_length', domain_length1),
        ('number_of_vouls', number_of_vouls1),
        ('classifier', LGBMClassifier())
        ])

But when i try to fit the model like

 pipe.fit(X, y)

I have AttributeError: 'int' object has no attribute 'split' but whenever i do

domain_length(X)
Output : [9, 9]

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

梦冥2025-01-26 06:55:17

管道步骤按顺序应用，因此您的第二个转换器接收电子邮件长度而不是电子邮件地址。

您可以在此处使用ColumnTransformer 或FeatureUnion。例如，

preproc = FeatureUnion([
        ('email_length', email_length1),
        ('domain_length', domain_length1),
        ('number_of_vouls', number_of_vouls1),
])

pipe = Pipeline([
        ('preproc', preproc),
        ('classifier', LGBMClassifier())
        ])

由于函数中返回的形状，您将收到一个新错误，但将它们包装到 numpy 数组并对其进行整形似乎可以工作：

def email_length(email) -> np.array:
    return np.array([len(e.split('@')[0]) for e in email]).reshape(-1, 1)

Pipeline steps are applied sequentially, so your second transformer is receiving the email lengths rather than the email addresses.

You can use a ColumnTransformer or FeatureUnion here. For example,

preproc = FeatureUnion([
        ('email_length', email_length1),
        ('domain_length', domain_length1),
        ('number_of_vouls', number_of_vouls1),
])

pipe = Pipeline([
        ('preproc', preproc),
        ('classifier', LGBMClassifier())
        ])

You'll get a new error because of the shape of the returns in your functions, but wrapping those up to numpy arrays and shaping them appears to work:

def email_length(email) -> np.array:
    return np.array([len(e.split('@')[0]) for e in email]).reshape(-1, 1)

回复收藏 0 原文

~没有更多了~

关于作者

快乐很简单

暂无简介

文章

27 人气

关注发私信

友情链接

文江博客

许多 FunctionTransformer 到同一列 - sklearn

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

万事如意

微信用户

1649543945

小巷里的女流氓

涂生~腾锐知识产权

华纳云

友情链接

许多 FunctionTransformer 到同一列 - sklearn

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

万事如意

微信用户

1649543945

小巷里的女流氓

涂生~腾锐知识产权

华纳云

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。