如何使用GridSearchCV中的DF字典中的每个DF获得最佳参数?
第一,我有一个dataframes,dfs的字典,其中包含五个不同的数据框。
第二,我使用的是Scikit-Learn回归剂,具有以下参数的RandomTrest:
regressor = RandomForestRegressor(random_state = None)
num_estimators = list(np.linspace(10, 100, num=5, endpoint=True).astype(int))
max_features = ["auto", "sqrt", "log2"]
min_samples_split = [2,4,8]
params = {'regressor__n_estimators': num_estimators,
'regressor__max_features': max_features,
'regressor__min_samples_split': min_samples_split,
'regressor__bootstrap': [False]},
我的管道的三个元素如下:
# numeric columns to use
num_columns = list(subset_features[2:])
# pipeline for processing numerical feeatures
num_transformer = Pipeline([('impute', IterativeImputer()),
('scale', StandardScaler())])
column_transformer = ColumnTransformer([('num_pipeline', num_transformer, num_columns)])
# the pipeline
pipe = Pipeline(steps=[("ct", column_transformer), ("reg", regressor)])
最后,GridSearch and Fit是以下内容:
gs = GridSearchCV(estimator=pipe,
param_grid=params,
cv=5,
n_jobs=-1,
verbose=1,
scoring=scorer # a user-defined scoring function,
refit=True)
# run the gs for each dataframe
gs_output = {}
for id, df in enumerate(dfs.values()):
print('starting id:', id)
gs_results[id] = gs.fit(df)
AFR运行上述模型,我尝试获得最佳参数的尝试对于gs.best_params _
的每个数据框架 仅检索一组最佳参数,如下所示。
Best params: {'bootstrap': False, 'max_features': 'log2', 'min_samples_split': 4, 'n_estimators': 10}
我想要的是获得五个最佳参数估计,每个数据框架一个。
One, I have a dictionary of dataframes, dfs, with five different dataframes in it.
Two, I am using a scikit-learn regressor, RandomForest with the following parameters:
regressor = RandomForestRegressor(random_state = None)
num_estimators = list(np.linspace(10, 100, num=5, endpoint=True).astype(int))
max_features = ["auto", "sqrt", "log2"]
min_samples_split = [2,4,8]
params = {'regressor__n_estimators': num_estimators,
'regressor__max_features': max_features,
'regressor__min_samples_split': min_samples_split,
'regressor__bootstrap': [False]},
Three the elements of my pipeline are as below:
# numeric columns to use
num_columns = list(subset_features[2:])
# pipeline for processing numerical feeatures
num_transformer = Pipeline([('impute', IterativeImputer()),
('scale', StandardScaler())])
column_transformer = ColumnTransformer([('num_pipeline', num_transformer, num_columns)])
# the pipeline
pipe = Pipeline(steps=[("ct", column_transformer), ("reg", regressor)])
Finally, the gridsearch and fit are the following:
gs = GridSearchCV(estimator=pipe,
param_grid=params,
cv=5,
n_jobs=-1,
verbose=1,
scoring=scorer # a user-defined scoring function,
refit=True)
# run the gs for each dataframe
gs_output = {}
for id, df in enumerate(dfs.values()):
print('starting id:', id)
gs_results[id] = gs.fit(df)
Afer running the above model, my attempts at getting the best parameters for each dataframe with gs.best_params_
retrieves only one set of best parameters, shown below.
Best params: {'bootstrap': False, 'max_features': 'log2', 'min_samples_split': 4, 'n_estimators': 10}
What I want is to get five best parameter estimates, one for each dataframe.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
在发布了这个问题之后,我想我提出了一个解决我想做的事情的解决方案。我只需要将代码包裹在功能中,然后输出我想看到的输出的元组,并迭代函数。以下是我的更新代码。
然后,我在功能上运行以下代码,并获取输出:
最佳分数
和最佳参数
用于随机森林模型中使用的每个数据帧。After posting this question, I think I came up with a solution that does what I wanted to do. I simply had to wrap the code in a function and output a tuple of output of what I wanted to see and iterate over function. Below is my updated code.
I then ran the following code on the function and got the outputs:
best scores
andbest params
for each dataframe used in the random forest model.