使用Imblearn进行Smote后进行随机下采样
我正在尝试使用RandomundUnderSampler()
和smote()()
来实现结合过度采样和下采样。
我正在研究Loan_status数据集。
我已经完成了以下分裂。
X = df.drop(['Loan_Status'],axis=1).values # independant features
y = df['Loan_Status'].values# dependant variable
这就是我的培训数据分布的样子。
这是我尝试执行类平衡的代码段。
from imblearn.over_sampling import SMOTE
from imblearn.under_sampling import RandomUnderSampler
from imblearn.pipeline import make_pipeline
over = SMOTE(sampling_strategy=0.1)
under = RandomUnderSampler(sampling_strategy=0.5)
pipeline = make_pipeline(over,under)
x_sm,y_sm = pipeline.fit_resample(X_train,y_train)
它给了我一个带有以下追溯的价值:
ValueError Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_64588/3438707951.py in <module>
4 pipeline = make_pipeline(over,under)
5
----> 6 x_copy,y_copy = pipeline.fit_resample(x_train_copy,y_train_copy)
~\Anaconda3\lib\site-packages\imblearn\pipeline.py in fit_resample(self, X, y, **fit_params)
351 fit_params_last_step = fit_params_steps[self.steps[-1][0]]
352 if hasattr(last_step, "fit_resample"):
--> 353 return last_step.fit_resample(Xt, yt, **fit_params_last_step)
354
355 @if_delegate_has_method(delegate="_final_estimator")
~\Anaconda3\lib\site-packages\imblearn\base.py in fit_resample(self, X, y)
77 X, y, binarize_y = self._check_X_y(X, y)
78
---> 79 self.sampling_strategy_ = check_sampling_strategy(
80 self.sampling_strategy, y, self._sampling_type
81 )
~\Anaconda3\lib\site-packages\imblearn\utils\_validation.py in check_sampling_strategy(sampling_strategy, y, sampling_type, **kwargs)
532 return OrderedDict(
533 sorted(
--> 534 _sampling_strategy_float(sampling_strategy, y, sampling_type).items()
535 )
536 )
~\Anaconda3\lib\site-packages\imblearn\utils\_validation.py in _sampling_strategy_float(sampling_strategy, y, sampling_type)
391 ]
392 ):
--> 393 raise ValueError(
394 "The specified ratio required to generate new "
395 "sample in the majority class while trying to "
ValueError: The specified ratio required to generate new sample in the majority class while trying to remove samples. Please increase the ratio.
I am trying to implement combining over-sampling and under-sampling using RandomUnderSampler()
and SMOTE()
.
I am working on the loan_status dataset.
I have done the following split.
X = df.drop(['Loan_Status'],axis=1).values # independant features
y = df['Loan_Status'].values# dependant variable
This is how my training data's distribution looks like.
this is the code snippet that i tried to execute for class-balancing.
from imblearn.over_sampling import SMOTE
from imblearn.under_sampling import RandomUnderSampler
from imblearn.pipeline import make_pipeline
over = SMOTE(sampling_strategy=0.1)
under = RandomUnderSampler(sampling_strategy=0.5)
pipeline = make_pipeline(over,under)
x_sm,y_sm = pipeline.fit_resample(X_train,y_train)
it gave me a ValueError with the following traceback:
ValueError Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_64588/3438707951.py in <module>
4 pipeline = make_pipeline(over,under)
5
----> 6 x_copy,y_copy = pipeline.fit_resample(x_train_copy,y_train_copy)
~\Anaconda3\lib\site-packages\imblearn\pipeline.py in fit_resample(self, X, y, **fit_params)
351 fit_params_last_step = fit_params_steps[self.steps[-1][0]]
352 if hasattr(last_step, "fit_resample"):
--> 353 return last_step.fit_resample(Xt, yt, **fit_params_last_step)
354
355 @if_delegate_has_method(delegate="_final_estimator")
~\Anaconda3\lib\site-packages\imblearn\base.py in fit_resample(self, X, y)
77 X, y, binarize_y = self._check_X_y(X, y)
78
---> 79 self.sampling_strategy_ = check_sampling_strategy(
80 self.sampling_strategy, y, self._sampling_type
81 )
~\Anaconda3\lib\site-packages\imblearn\utils\_validation.py in check_sampling_strategy(sampling_strategy, y, sampling_type, **kwargs)
532 return OrderedDict(
533 sorted(
--> 534 _sampling_strategy_float(sampling_strategy, y, sampling_type).items()
535 )
536 )
~\Anaconda3\lib\site-packages\imblearn\utils\_validation.py in _sampling_strategy_float(sampling_strategy, y, sampling_type)
391 ]
392 ):
--> 393 raise ValueError(
394 "The specified ratio required to generate new "
395 "sample in the majority class while trying to "
ValueError: The specified ratio required to generate new sample in the majority class while trying to remove samples. Please increase the ratio.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
您必须增加
smote
的采样策略,因为(((y_train == 0).sum())/(((y_train == 1).sum().sum())
高于0.1
。看来您的起始不平衡率是(通过眼睛)0.4
。尝试:最后,您可能需要一个相等的最终比率(在不足之后),因此您应该将采样策略设置为
1.0
的RandomundunderSampler
:还有其他问题给我一个反馈。
You have to increase the sampling strategy for the
SMOTE
because((y_train==0).sum())/((y_train==1).sum())
is higher than0.1
. It seems that your starting imbalance ratio is about (by eye)0.4
. Try:Finally you probably want an equal final ratio (after the under-sampling) so you should set the sampling strategy to
1.0
for theRandomUnderSampler
:Try this way and if you have other problems give me a feedback.