与简单的随机森林相比，带有网格搜索的随机森林搜索最差

发布于 2025-01-23 19:33:42 字数 2024 浏览 6 评论 0原文

我正在使用一个简单的随机森林进行训练模型，然后使用网格搜索与随机森林完全相同的数据集进行训练。骄傲的是，由于网格搜索寻找值的最佳组合，因此后来的搜索应更高，但情况恰恰相反。

#Random Forest
clf=RandomForestClassifier()
model=clf.fit(X_train, y_train)
y_pred=model.predict(X_test)

#Model metrics
results=classifmodel_Metrics('rf',model, y_test, y_pred)
list_of_results.append(results)

#GridSearchCV

clf=RandomForestClassifier()
parameter_grid={'n_estimators':[50,100,150,250,500,1000,1500,2000,2500,3000],
                'max_depth':[1,2,3,4,5,6]}
gridSearch=GridSearchCV(clf,parameter_grid,cv=5,n_jobs=1,verbose=5)
gridSearchResults=gridSearch.fit(X,y)

print(gridSearchResults.best_estimator_)
clf=gridSearchResults.best_estimator_
model=clf.fit(X_train, y_train)
y_pred=model.predict(X_test)

#Model metrics
results=classifmodel_Metrics('rfopt',model,y_test,y_pred)
list_of_results.append(results)

print(list_of_results)

有人知道为什么会发生这种情况吗？我的代码有问题，还是可能大都会发生的事情？我用来计算模型性能的功能是，是我使用的参考值（F1越高，模型是）


def classifmodel_Metrics(modelName, model, actual, predicted):

    classes = list(np.unique(np.concatenate((actual,predicted))))

    confMtx = confusion_matrix(actual,predicted)

    print("Confusion Matrix")
    print(confMtx)

    report = classification_report(actual,predicted,output_dict = True)

    precision = report["macro avg"]["precision"]
    recall = report["macro avg"]["recall"]
    f1 = report["macro avg"]["f1-score"] # Média ponderada da precision e recall

    res = pd.Series({
    "ModelName":modelName,
    "Model":model,
    "accuracy":round(accuracy_score(predicted,actual),3),
    "precision": round(precision,3),
    "recall": round(recall,3),
    "f1": round(f1,3)
    })

    if len(classes) == 2:
        print("\naccuracy: {0:.2%}".format(round(accuracy_score(predicted,actual),3)))
        print("\nprecision: {0:.2%}".format(precision))
        print("\nrecall: {0:.2%}".format(recall))
        print("\nf1: {0:.2%}".format(f1))
    else:
        print("\n",classification_report(actual,predicted))

    return res

原文

I am training a model using a simple Random Forest and then another model with the exact same dataset with Random Forest using Grid Search. Supossely , since Grid Search looks for the best combination of values ,the perfomance of the later one should be higher, but the opposite is happening.

#Random Forest
clf=RandomForestClassifier()
model=clf.fit(X_train, y_train)
y_pred=model.predict(X_test)

#Model metrics
results=classifmodel_Metrics('rf',model, y_test, y_pred)
list_of_results.append(results)

#GridSearchCV

clf=RandomForestClassifier()
parameter_grid={'n_estimators':[50,100,150,250,500,1000,1500,2000,2500,3000],
                'max_depth':[1,2,3,4,5,6]}
gridSearch=GridSearchCV(clf,parameter_grid,cv=5,n_jobs=1,verbose=5)
gridSearchResults=gridSearch.fit(X,y)

print(gridSearchResults.best_estimator_)
clf=gridSearchResults.best_estimator_
model=clf.fit(X_train, y_train)
y_pred=model.predict(X_test)

#Model metrics
results=classifmodel_Metrics('rfopt',model,y_test,y_pred)
list_of_results.append(results)

print(list_of_results)

Does anyone know why is this happening? Is something wrong with my code or is something that can esporadically happen?
The function I use to calcule my model performance is this , being F1 the value I use for reference( the higher the F1 the best the model is)


def classifmodel_Metrics(modelName, model, actual, predicted):

    classes = list(np.unique(np.concatenate((actual,predicted))))

    confMtx = confusion_matrix(actual,predicted)

    print("Confusion Matrix")
    print(confMtx)

    report = classification_report(actual,predicted,output_dict = True)

    precision = report["macro avg"]["precision"]
    recall = report["macro avg"]["recall"]
    f1 = report["macro avg"]["f1-score"] # Média ponderada da precision e recall

    res = pd.Series({
    "ModelName":modelName,
    "Model":model,
    "accuracy":round(accuracy_score(predicted,actual),3),
    "precision": round(precision,3),
    "recall": round(recall,3),
    "f1": round(f1,3)
    })

    if len(classes) == 2:
        print("\naccuracy: {0:.2%}".format(round(accuracy_score(predicted,actual),3)))
        print("\nprecision: {0:.2%}".format(precision))
        print("\nrecall: {0:.2%}".format(recall))
        print("\nf1: {0:.2%}".format(f1))
    else:
        print("\n",classification_report(actual,predicted))

    return res

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

淑女气质 2025-01-30 19:33:42

您的参数网格很可能只是捕获足够的深度以使您的模型学习得很好。

您有：

parameter_grid={'n_estimators':[50,100,150,250,500,1000,1500,2000,2500,3000],
                'max_depth':[1,2,3,4,5,6]}

您将模型限制在最多64片叶子（2^6）的地方，最大深度为6。 n 样品在其自己的叶子（＆lt; n /2）中。

为了提高性能，我将使用更多的深度选项。您在森林中需要这么多树（回报率降低）也很不可能。尝试这样的事情：

parameter_grid={'n_estimators':[64, 128, 256],
                'max_depth':[2, 4, 8, 16, 36, 64]}

如果您的最佳模型是最大深度，则可以尝试增加正在测试的限制。

It's likely that your parameter grid is just not capturing enough depth for your model to learn well.

You have:

parameter_grid={'n_estimators':[50,100,150,250,500,1000,1500,2000,2500,3000],
                'max_depth':[1,2,3,4,5,6]}

Where you are limiting your model to at most 64 leaves (2^6) through having a max depth of 6. In contrast, the default for scikit-learn is have as many layers as necessary until each of your n samples is in its own leaf (<n/2).

To improve performance I would use more depth options. It's also highly unlikely that you need so many trees in your forest (diminishing returns). Try something like this instead:

parameter_grid={'n_estimators':[64, 128, 256],
                'max_depth':[2, 4, 8, 16, 36, 64]}

If your best model is topping out the max depth, you can try increasing the limit you are testing.

回复收藏 0 原文

~没有更多了~

关于作者

寂寞笑我太脆弱

暂无简介

文章

27 人气

关注发私信

友情链接

文江博客

与简单的随机森林相比，带有网格搜索的随机森林搜索最差

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

浪子阿飞

JK.Yang

人间不值得

静待花开

只涨不跌

污浊的双黑

友情链接

与简单的随机森林相比，带有网格搜索的随机森林搜索最差

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

浪子阿飞

JK.Yang

人间不值得

静待花开

只涨不跌

污浊的双黑

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。