八、Random Forest

发布于 2023-07-17 23:38:23 字数 4680 浏览 0 评论 0 收藏 0

scikit-learn基于随机森林算法提供了两个模型：
- RandomForestClassifier用于分类问题
- RandomForestRegressor用于回归问题

8.1 RandomForestClassifier

GradientBoostingClassifier是随机森林分类模型，其原型为：
```
xxxxxxxxxx
class sklearn.ensemble.RandomForestClassifier(n_estimators=10, criterion='gini', 
max_depth=None, min_samples_split=2, min_samples_leaf=1, min_weight_fraction_leaf=0.0,
max_features='auto', max_leaf_nodes=None, bootstrap=True, oob_score=False, n_jobs=1,
random_state=None, verbose=0, warm_start=False, class_weight=None)
```
- n_estimators：一个整数，指定了随机森林中决策树的数量.
- criterion：一个字符串，指定了每个决策树的criterion参数。
- max_features：一个整数或者浮点数或者字符串或者None，指定了每个决策树的max_features参数。
- max_depth：一个整数或者None，指定了每个决策树的max_depth参数。
  如果max_leaf_nodes不是None，则忽略本参数。
- min_samples_split：一个整数，指定了每个决策树的min_samples_split参数。
- min_samples_leaf：一个整数，指定了每个决策树的min_samples_leaf参数。
- min_weight_fraction_leaf：一个浮点数，指定了每个决策树的min_weight_fraction_leaf参数。
- max_leaf_nodes：为整数或者None，指定了每个基础决策树模型的max_leaf_nodes参数。
- boostrap：为布尔值。如果为True，则使用采样法bootstrap sampling来产生决策树的训练数据集。
- oob_score：为布尔值。如果为True，则使用包外样本来计算泛化误差。
- n_jobs：指定并行性。
- random_state：指定随机数种子。
- verbose：一个正数。用于开启/关闭迭代中间输出日志功能。
- warm_start：一个布尔值。用于指定是否继续使用上一次训练的结果。
- class_weight：一个字典，或者字典的列表，或者字符串'balanced'，或者字符串'balanced_subsample'，或者None：
  - 如果为字典：则字典给出了每个分类的权重，如：{class_label: weight} 。
  - 如果为字符串'balanced'：则每个分类的权重与该分类在样本集中出现的频率成反比。
  - 如果为字符串 'balanced_subsample'：则样本集为采样法bootstrap sampling产生的决策树的训练数据集，每个分类的权重与该分类在采用生成的样本集中出现的频率成反比。
  - 如果为None：则每个分类的权重都为 1 。
模型属性：
- estimators_：所有训练过的基础决策树。
- classes_：所有的类别标签。
- n_classes_：类别数量。
- n_features_：训练时使用的特征数量。
- n_outputs_：训练时输出的数量。
- feature_importances_：每个特征的重要性。
- oob_score_：训练数据使用包外估计时的得分。
模型方法：
- fit(X, y[, sample_weight])：训练模型。
- predict(X)：用模型进行预测，返回预测值。
- predict_log_proba(X)：返回一个数组，数组的元素依次是X预测为各个类别的概率的对数值。
- predict_proba(X)：返回一个数组，数组的元素依次是X预测为各个类别的概率值。
- score(X,y[,sample_weight])：返回模型的预测性能得分。

8.2 RandomForestRegressor

RandomForestRegressor是随机森林回归模型，其原型为：


xxxxxxxxxx
class sklearn.ensemble.RandomForestRegressor(n_estimators=10, criterion='mse', 
max_depth=None, min_samples_split=2, min_samples_leaf=1, min_weight_fraction_leaf=0.0,
max_features='auto', max_leaf_nodes=None, bootstrap=True, oob_score=False, n_jobs=1,
random_state=None, verbose=0, warm_start=False

参数：参考 GradientBoostingClassifier 。

模型属性：
- estimators_：所有训练过的基础决策树。
- n_features_：训练时使用的特征数量。
- n_outputs_：训练时输出的数量。
- feature_importances_：每个特征的重要性。
- oob_score_：训练数据使用包外估计时的得分。
- oob_prediction_：训练数据使用包外估计时的预测值。
模型方法：
- fit(X, y[, sample_weight])：训练模型。
- predict(X)：用模型进行预测，返回预测值。
- score(X,y[,sample_weight])：返回模型的预测性能得分。