使用Imblearn进行Smote后进行随机下采样

发布于 2025-01-20 09:31:25 字数 2680 浏览 7 评论 0原文

我正在尝试使用RandomundUnderSampler（）和smote（）（）来实现结合过度采样和下采样。

我正在研究Loan_status数据集。

我已经完成了以下分裂。

X = df.drop(['Loan_Status'],axis=1).values   # independant features
y = df['Loan_Status'].values# dependant variable

这就是我的培训数据分布的样子。

这是我尝试执行类平衡的代码段。

from imblearn.over_sampling import SMOTE
from imblearn.under_sampling import RandomUnderSampler
from imblearn.pipeline import make_pipeline
over = SMOTE(sampling_strategy=0.1)
under = RandomUnderSampler(sampling_strategy=0.5)
pipeline = make_pipeline(over,under)
    
x_sm,y_sm = pipeline.fit_resample(X_train,y_train)

它给了我一个带有以下追溯的价值：

ValueError                                Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_64588/3438707951.py in <module>
      4 pipeline = make_pipeline(over,under)
      5 
----> 6 x_copy,y_copy = pipeline.fit_resample(x_train_copy,y_train_copy)

~\Anaconda3\lib\site-packages\imblearn\pipeline.py in fit_resample(self, X, y, **fit_params)
    351             fit_params_last_step = fit_params_steps[self.steps[-1][0]]
    352             if hasattr(last_step, "fit_resample"):
--> 353                 return last_step.fit_resample(Xt, yt, **fit_params_last_step)
    354 
    355     @if_delegate_has_method(delegate="_final_estimator")

~\Anaconda3\lib\site-packages\imblearn\base.py in fit_resample(self, X, y)
     77         X, y, binarize_y = self._check_X_y(X, y)
     78 
---> 79         self.sampling_strategy_ = check_sampling_strategy(
     80             self.sampling_strategy, y, self._sampling_type
     81         )

~\Anaconda3\lib\site-packages\imblearn\utils\_validation.py in check_sampling_strategy(sampling_strategy, y, sampling_type, **kwargs)
    532         return OrderedDict(
    533             sorted(
--> 534                 _sampling_strategy_float(sampling_strategy, y, sampling_type).items()
    535             )
    536         )

~\Anaconda3\lib\site-packages\imblearn\utils\_validation.py in _sampling_strategy_float(sampling_strategy, y, sampling_type)
    391             ]
    392         ):
--> 393             raise ValueError(
    394                 "The specified ratio required to generate new "
    395                 "sample in the majority class while trying to "

ValueError: The specified ratio required to generate new sample in the majority class while trying to remove samples. Please increase the ratio.

原文

I am trying to implement combining over-sampling and under-sampling using RandomUnderSampler() and SMOTE().

I am working on the loan_status dataset.

I have done the following split.

X = df.drop(['Loan_Status'],axis=1).values   # independant features
y = df['Loan_Status'].values# dependant variable

This is how my training data's distribution looks like.

this is the code snippet that i tried to execute for class-balancing.

from imblearn.over_sampling import SMOTE
from imblearn.under_sampling import RandomUnderSampler
from imblearn.pipeline import make_pipeline
over = SMOTE(sampling_strategy=0.1)
under = RandomUnderSampler(sampling_strategy=0.5)
pipeline = make_pipeline(over,under)
    
x_sm,y_sm = pipeline.fit_resample(X_train,y_train)

it gave me a ValueError with the following traceback:

ValueError                                Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_64588/3438707951.py in <module>
      4 pipeline = make_pipeline(over,under)
      5 
----> 6 x_copy,y_copy = pipeline.fit_resample(x_train_copy,y_train_copy)

~\Anaconda3\lib\site-packages\imblearn\pipeline.py in fit_resample(self, X, y, **fit_params)
    351             fit_params_last_step = fit_params_steps[self.steps[-1][0]]
    352             if hasattr(last_step, "fit_resample"):
--> 353                 return last_step.fit_resample(Xt, yt, **fit_params_last_step)
    354 
    355     @if_delegate_has_method(delegate="_final_estimator")

~\Anaconda3\lib\site-packages\imblearn\base.py in fit_resample(self, X, y)
     77         X, y, binarize_y = self._check_X_y(X, y)
     78 
---> 79         self.sampling_strategy_ = check_sampling_strategy(
     80             self.sampling_strategy, y, self._sampling_type
     81         )

~\Anaconda3\lib\site-packages\imblearn\utils\_validation.py in check_sampling_strategy(sampling_strategy, y, sampling_type, **kwargs)
    532         return OrderedDict(
    533             sorted(
--> 534                 _sampling_strategy_float(sampling_strategy, y, sampling_type).items()
    535             )
    536         )

~\Anaconda3\lib\site-packages\imblearn\utils\_validation.py in _sampling_strategy_float(sampling_strategy, y, sampling_type)
    391             ]
    392         ):
--> 393             raise ValueError(
    394                 "The specified ratio required to generate new "
    395                 "sample in the majority class while trying to "

ValueError: The specified ratio required to generate new sample in the majority class while trying to remove samples. Please increase the ratio.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

留一抹残留的笑 2025-01-27 09:31:25

您必须增加smote的采样策略，因为（（（y_train == 0）.sum（））/（（（y_train == 1）.sum（）.sum（））高于0.1。看来您的起始不平衡率是（通过眼睛）0.4。尝试：

over = SMOTE(sampling_strategy=0.5)

最后，您可能需要一个相等的最终比率（在不足之后），因此您应该将采样策略设置为1.0的RandomundunderSampler

under = RandomUnderSampler(sampling_strategy=1)

：还有其他问题给我一个反馈。

You have to increase the sampling strategy for the SMOTE because ((y_train==0).sum())/((y_train==1).sum()) is higher than 0.1. It seems that your starting imbalance ratio is about (by eye) 0.4. Try:

over = SMOTE(sampling_strategy=0.5)

Finally you probably want an equal final ratio (after the under-sampling) so you should set the sampling strategy to 1.0 for the RandomUnderSampler:

under = RandomUnderSampler(sampling_strategy=1)

Try this way and if you have other problems give me a feedback.

回复收藏 0 原文

~没有更多了~

关于作者

遇见了你

暂无简介

文章

27 人气

关注发私信

友情链接

文江博客

使用Imblearn进行Smote后进行随机下采样

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

十二

飞烟轻若梦

OPleyuhuo

wxb0109

旧城空念

-小熊_

友情链接

使用Imblearn进行Smote后进行随机下采样

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

十二

飞烟轻若梦

OPleyuhuo

wxb0109

旧城空念

-小熊_

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。