如何改善我的回归模型在随机森林回归中更准确

发布于 2025-01-31 18:25:59 字数 2511 浏览 3 评论 0原文

问题:将R2接近0.64。想更多地改善我的结果。不知道这些结果是什么问题。已经删除异常值,转换字符串 - >数值,归一化。想知道我的输出有任何问题吗?如果我没有正确地问问题,请问我任何事情。这只是我在堆栈溢出上的起点。

y.value_counts()
3.3    215
3.0    185
2.7    154
3.7    134
2.3     96
4.0     54
2.0     31
1.7     21
1.3     20

这是我的输出的直方图。我在回归方面不是专业的需要的超级帮助。

“我的输出的直方图”

删除我的输入中的截线


import seaborn as sns
# data=z_scores(df)
data=df
correlation=data.corr()

k=22
cols=correlation.nlargest(k,'Please enter your Subjects GPA which you have studied? (CS) [Introduction to ICT]')['Please enter your Subjects GPA which you have studied? (CS) [Introduction to ICT]'].index
cm=np.corrcoef(data[cols].values.T)
f,ax=plt.subplots(figsize=(15,15))
sns.heatmap(cm,vmax=.8,linewidths=0.01,square=True,annot=True,cmap='viridis',
            linecolor="white",xticklabels=cols.values,annot_kws={'size':12},yticklabels=cols.values)

cols=pd.DataFrame(cols)
cols=cols.set_axis(["Selected Features"], axis=1)
cols=cols[cols['Selected Features'] != 'Please enter your Subjects GPA which you have studied? (CS) [Introduction to ICT]']
cols=cols[cols['Selected Features'] != 'Your Fsc/Ics marks percentage?']
X=df[cols['Selected Features'].tolist()]
X

然后然后然后随机应用随机随机森林回归者并获得了这些结果

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.1, random_state=42)

from sklearn.ensemble import RandomForestRegressor
regressor = RandomForestRegressor(n_estimators = 10, random_state = 0)
model=regressor.fit(X_train, y_train)
y_pred = model.predict(X_test)

print("MAE Score: ", mean_absolute_error(y_test, y_pred))
print("MSE Score: ", mean_squared_error(y_test, y_pred))
print("RMSE Score: ", math.sqrt(mean_squared_error(y_test, y_pred)))
print("R2 score : %.2f" %r2_score(y_test,y_pred))

得到了这些结果。

MAE Score:  0.252967032967033
MSE Score:  0.13469450549450546
RMSE Score:  0.36700750059706605
R2 score : 0.64

Issue: Getting r2 near to 0.64. Want to improve my results more. Don't know what's the issue of these results. Have done Removing outliers, Converting String -> Numerical, normalization. Wanna know is there any issue with my output? Please ask me anything if I didn't ask the question correctly. It's just my starting on Stack overflow.

y.value_counts()
3.3    215
3.0    185
2.7    154
3.7    134
2.3     96
4.0     54
2.0     31
1.7     21
1.3     20

This is histogram of my outputs. I am not professional in Regression need super help from your side.

Histogram of my Outputs

Removing Collinearity in my inputs


import seaborn as sns
# data=z_scores(df)
data=df
correlation=data.corr()

k=22
cols=correlation.nlargest(k,'Please enter your Subjects GPA which you have studied? (CS) [Introduction to ICT]')['Please enter your Subjects GPA which you have studied? (CS) [Introduction to ICT]'].index
cm=np.corrcoef(data[cols].values.T)
f,ax=plt.subplots(figsize=(15,15))
sns.heatmap(cm,vmax=.8,linewidths=0.01,square=True,annot=True,cmap='viridis',
            linecolor="white",xticklabels=cols.values,annot_kws={'size':12},yticklabels=cols.values)

enter image description here

cols=pd.DataFrame(cols)
cols=cols.set_axis(["Selected Features"], axis=1)
cols=cols[cols['Selected Features'] != 'Please enter your Subjects GPA which you have studied? (CS) [Introduction to ICT]']
cols=cols[cols['Selected Features'] != 'Your Fsc/Ics marks percentage?']
X=df[cols['Selected Features'].tolist()]
X

Then applied Random Forest Regressor and got these results

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.1, random_state=42)

from sklearn.ensemble import RandomForestRegressor
regressor = RandomForestRegressor(n_estimators = 10, random_state = 0)
model=regressor.fit(X_train, y_train)
y_pred = model.predict(X_test)

print("MAE Score: ", mean_absolute_error(y_test, y_pred))
print("MSE Score: ", mean_squared_error(y_test, y_pred))
print("RMSE Score: ", math.sqrt(mean_squared_error(y_test, y_pred)))
print("R2 score : %.2f" %r2_score(y_test,y_pred))

Got these Results.

MAE Score:  0.252967032967033
MSE Score:  0.13469450549450546
RMSE Score:  0.36700750059706605
R2 score : 0.64

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

oО清风挽发oО 2025-02-07 18:26:00

为了获得更好的结果,您需要进行超参数调整,请尝试专注于这些

  1.   n_estimators =森林中的树木数
    max_features =考虑分裂节点的最大功能数量
    max_depth =每个决策树中的最大级别数
    min_samples_split =在节点分开之前放置在节点中的数据点的最小值
    min_samples_leaf =叶节点中允许的数据点的最小数据点
    Bootstrap =用于采样数据点的方法(有或没有替换)  
     
  2.  当前正在使用的参数(随机森林回归器)
    {'bootstrap':true,
    “标准”:'MSE',
    'max_depth':无,
    'max_features':'auto',
    'max_leaf_nodes':无,
    'min_impurity_decrease':0.0,
    'min_impurity_split':无,
    'min_samples_leaf':1,
    'min_samples_split':2,
    'min_weight_fraction_leaf':0.0,
    'n_estimators':10,
    'n_jobs':1,
    'oob_score':false,
    'Random_State':42,
    '冗长':0,
    'harm_start':false} 
     
  3. k倍交叉验证

  4. 使用网格搜索cv

in order to get better results you need to do hyper-parameter tuning try to focus on these

  1. n_estimators = number of trees in the forest
    max_features = max number of features considered for splitting a node
    max_depth = max number of levels in each decision tree
    min_samples_split = min number of data points placed in a node before the node is split
    min_samples_leaf = min number of data points allowed in a leaf node
    bootstrap = method for sampling data points (with or without replacement)  
    
  2. Parameters currently in use(random forest regressor )
    {'bootstrap': True,
    'criterion': 'mse',
    'max_depth': None,
    'max_features': 'auto',
    'max_leaf_nodes': None,
    'min_impurity_decrease': 0.0,
    'min_impurity_split': None,
    'min_samples_leaf': 1,
    'min_samples_split': 2,
    'min_weight_fraction_leaf': 0.0,
    'n_estimators': 10,
    'n_jobs': 1,
    'oob_score': False,
    'random_state': 42,
    'verbose': 0,
    'warm_start': False} 
    
  3. k fold cross validation

  4. use grid search cv

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文