使用Imblearn进行Smote后进行随机下采样

发布于 2025-01-20 09:31:25 字数 2680 浏览 7 评论 0原文

我正在尝试使用RandomundUnderSampler()smote()()来实现结合过度采样和下采样。

我正在研究Loan_status数据集。

我已经完成了以下分裂。

X = df.drop(['Loan_Status'],axis=1).values   # independant features
y = df['Loan_Status'].values# dependant variable

这就是我的培训数据分布的样子。

这是我尝试执行类平衡的代码段。

from imblearn.over_sampling import SMOTE
from imblearn.under_sampling import RandomUnderSampler
from imblearn.pipeline import make_pipeline
over = SMOTE(sampling_strategy=0.1)
under = RandomUnderSampler(sampling_strategy=0.5)
pipeline = make_pipeline(over,under)
    
x_sm,y_sm = pipeline.fit_resample(X_train,y_train)

它给了我一个带有以下追溯的价值:

ValueError                                Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_64588/3438707951.py in <module>
      4 pipeline = make_pipeline(over,under)
      5 
----> 6 x_copy,y_copy = pipeline.fit_resample(x_train_copy,y_train_copy)

~\Anaconda3\lib\site-packages\imblearn\pipeline.py in fit_resample(self, X, y, **fit_params)
    351             fit_params_last_step = fit_params_steps[self.steps[-1][0]]
    352             if hasattr(last_step, "fit_resample"):
--> 353                 return last_step.fit_resample(Xt, yt, **fit_params_last_step)
    354 
    355     @if_delegate_has_method(delegate="_final_estimator")

~\Anaconda3\lib\site-packages\imblearn\base.py in fit_resample(self, X, y)
     77         X, y, binarize_y = self._check_X_y(X, y)
     78 
---> 79         self.sampling_strategy_ = check_sampling_strategy(
     80             self.sampling_strategy, y, self._sampling_type
     81         )

~\Anaconda3\lib\site-packages\imblearn\utils\_validation.py in check_sampling_strategy(sampling_strategy, y, sampling_type, **kwargs)
    532         return OrderedDict(
    533             sorted(
--> 534                 _sampling_strategy_float(sampling_strategy, y, sampling_type).items()
    535             )
    536         )

~\Anaconda3\lib\site-packages\imblearn\utils\_validation.py in _sampling_strategy_float(sampling_strategy, y, sampling_type)
    391             ]
    392         ):
--> 393             raise ValueError(
    394                 "The specified ratio required to generate new "
    395                 "sample in the majority class while trying to "

ValueError: The specified ratio required to generate new sample in the majority class while trying to remove samples. Please increase the ratio.

I am trying to implement combining over-sampling and under-sampling using RandomUnderSampler() and SMOTE().

I am working on the loan_status dataset.

I have done the following split.

X = df.drop(['Loan_Status'],axis=1).values   # independant features
y = df['Loan_Status'].values# dependant variable

This is how my training data's distribution looks like.

target variable frequency count

this is the code snippet that i tried to execute for class-balancing.

from imblearn.over_sampling import SMOTE
from imblearn.under_sampling import RandomUnderSampler
from imblearn.pipeline import make_pipeline
over = SMOTE(sampling_strategy=0.1)
under = RandomUnderSampler(sampling_strategy=0.5)
pipeline = make_pipeline(over,under)
    
x_sm,y_sm = pipeline.fit_resample(X_train,y_train)

it gave me a ValueError with the following traceback:

ValueError                                Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_64588/3438707951.py in <module>
      4 pipeline = make_pipeline(over,under)
      5 
----> 6 x_copy,y_copy = pipeline.fit_resample(x_train_copy,y_train_copy)

~\Anaconda3\lib\site-packages\imblearn\pipeline.py in fit_resample(self, X, y, **fit_params)
    351             fit_params_last_step = fit_params_steps[self.steps[-1][0]]
    352             if hasattr(last_step, "fit_resample"):
--> 353                 return last_step.fit_resample(Xt, yt, **fit_params_last_step)
    354 
    355     @if_delegate_has_method(delegate="_final_estimator")

~\Anaconda3\lib\site-packages\imblearn\base.py in fit_resample(self, X, y)
     77         X, y, binarize_y = self._check_X_y(X, y)
     78 
---> 79         self.sampling_strategy_ = check_sampling_strategy(
     80             self.sampling_strategy, y, self._sampling_type
     81         )

~\Anaconda3\lib\site-packages\imblearn\utils\_validation.py in check_sampling_strategy(sampling_strategy, y, sampling_type, **kwargs)
    532         return OrderedDict(
    533             sorted(
--> 534                 _sampling_strategy_float(sampling_strategy, y, sampling_type).items()
    535             )
    536         )

~\Anaconda3\lib\site-packages\imblearn\utils\_validation.py in _sampling_strategy_float(sampling_strategy, y, sampling_type)
    391             ]
    392         ):
--> 393             raise ValueError(
    394                 "The specified ratio required to generate new "
    395                 "sample in the majority class while trying to "

ValueError: The specified ratio required to generate new sample in the majority class while trying to remove samples. Please increase the ratio.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

留一抹残留的笑 2025-01-27 09:31:25

您必须增加smote的采样策略,因为(((y_train == 0).sum())/(((y_train == 1).sum().sum())高于0.1。看来您的起始不平衡率是(通过眼睛)0.4。尝试:

over = SMOTE(sampling_strategy=0.5)

最后,您可能需要一个相等的最终比率(在不足之后),因此您应该将采样策略设置为1.0RandomundunderSampler

under = RandomUnderSampler(sampling_strategy=1)

:还有其他问题给我一个反馈。

You have to increase the sampling strategy for the SMOTE because ((y_train==0).sum())/((y_train==1).sum()) is higher than 0.1. It seems that your starting imbalance ratio is about (by eye) 0.4. Try:

over = SMOTE(sampling_strategy=0.5)

Finally you probably want an equal final ratio (after the under-sampling) so you should set the sampling strategy to 1.0 for the RandomUnderSampler:

under = RandomUnderSampler(sampling_strategy=1)

Try this way and if you have other problems give me a feedback.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文