为随机森林回归创建错误条

发布于 2025-02-10 02:11:58 字数 2937 浏览 2 评论 0 原文

我是机器学习世界的新手，更普遍地是AI。我正在使用Python和Jupyterlab分析包含不同房屋及其价格的数据集。

这是使用的数据集： httpps:

//www.kaggle.com/datasets/datasets/datasets/harlfoxem/harlfoxem/harlfoxem/houseslesprediction该数据集上的森林（Scikit-learn），现在我想绘制模型的误差条。具体来说，我正在使用Forestci软件包，并将此代码完全应用于我的案例：

这是我的代码：

# Regression Forest Example
import pandas as pd
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn import linear_model
from sklearn import metrics
from sklearn.metrics import r2_score
import numpy as np
from matplotlib import pyplot as plt
from sklearn.ensemble import RandomForestRegressor
import sklearn.model_selection as xval
import forestci as fci

#import dataset
mpg_data = pd.read_csv(path_to_dataset)

#drop some useless features
mpg_data=mpg_data.drop('date', axis=1)
mpg_data=mpg_data.drop('yr_built', axis=1)
mpg_data = mpg_data.drop(["id"],axis=1)

#separate mpg data into predictors and outcome variable
mpg_X = mpg_data.drop(labels='price', axis=1)
mpg_y = mpg_data['price']

# remove rows where the data is nan
not_null_sel = np.where(mpg_X.isna().sum(axis=1).values == 0)
mpg_X = mpg_X.values[not_null_sel]
mpg_y = mpg_y.values[not_null_sel]

# split mpg data into training and test set
mpg_X_train, mpg_X_test, mpg_y_train, mpg_y_test = xval.train_test_split(
    mpg_X,
    mpg_y,
    test_size=0.25,
    random_state=42)

# Create RandomForestRegressor
mpg_forest = RandomForestRegressor(random_state=42)
mpg_forest.fit(mpg_X_train, mpg_y_train)
mpg_y_hat = mpg_forest.predict(mpg_X_test)

# Plot predicted MPG without error bars
plt.scatter(mpg_y_test, mpg_y_hat)
plt.xlabel('Reported MPG')
plt.ylabel('Predicted MPG')
plt.show()

print(r2_score(mpg_y_test, mpg_y_hat))

# Calculate the variance
mpg_V_IJ_unbiased = fci.random_forest_error(mpg_forest, mpg_X_train,
                                            mpg_X_test)

# Plot error bars for predicted MPG using unbiased variance
plt.errorbar(mpg_y_test, mpg_y_hat, yerr=np.sqrt(mpg_V_IJ_unbiased), fmt='o')
plt.xlabel('Reported MPG')
plt.ylabel('Predicted MPG')
plt.show()

它似乎有效，但是当绘制图表时，错误栏和预测行都不会出现：

相反，如文档中可见的那样，它应该看起来像图片： http://contrib.scikit-learn.org/forest-confidence-interval/auto_examples/plot_mpg.html

原文

I'm new to the world of machine learning and more generally to AI.
I am analyzing a dataset containing characteristics of different houses and their prices using Python and JupyterLab.

Here is the dataset in use:
https://www.kaggle.com/datasets/harlfoxem/housesalesprediction

I applied random forest (scikit-learn) on this dataset and now I would like to plot the error bars of the model.
Specifically, I'm using the ForestCI package and applying exactly this code to my case:
http://contrib.scikit-learn.org/forest-confidence-interval/auto_examples/plot_mpg.html

This is my code:

# Regression Forest Example
import pandas as pd
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn import linear_model
from sklearn import metrics
from sklearn.metrics import r2_score
import numpy as np
from matplotlib import pyplot as plt
from sklearn.ensemble import RandomForestRegressor
import sklearn.model_selection as xval
import forestci as fci

#import dataset
mpg_data = pd.read_csv(path_to_dataset)

#drop some useless features
mpg_data=mpg_data.drop('date', axis=1)
mpg_data=mpg_data.drop('yr_built', axis=1)
mpg_data = mpg_data.drop(["id"],axis=1)

#separate mpg data into predictors and outcome variable
mpg_X = mpg_data.drop(labels='price', axis=1)
mpg_y = mpg_data['price']

# remove rows where the data is nan
not_null_sel = np.where(mpg_X.isna().sum(axis=1).values == 0)
mpg_X = mpg_X.values[not_null_sel]
mpg_y = mpg_y.values[not_null_sel]

# split mpg data into training and test set
mpg_X_train, mpg_X_test, mpg_y_train, mpg_y_test = xval.train_test_split(
    mpg_X,
    mpg_y,
    test_size=0.25,
    random_state=42)

# Create RandomForestRegressor
mpg_forest = RandomForestRegressor(random_state=42)
mpg_forest.fit(mpg_X_train, mpg_y_train)
mpg_y_hat = mpg_forest.predict(mpg_X_test)

# Plot predicted MPG without error bars
plt.scatter(mpg_y_test, mpg_y_hat)
plt.xlabel('Reported MPG')
plt.ylabel('Predicted MPG')
plt.show()

print(r2_score(mpg_y_test, mpg_y_hat))

# Calculate the variance
mpg_V_IJ_unbiased = fci.random_forest_error(mpg_forest, mpg_X_train,
                                            mpg_X_test)

# Plot error bars for predicted MPG using unbiased variance
plt.errorbar(mpg_y_test, mpg_y_hat, yerr=np.sqrt(mpg_V_IJ_unbiased), fmt='o')
plt.xlabel('Reported MPG')
plt.ylabel('Predicted MPG')
plt.show()

It seems to work but when the graphs are plotted, neither the error bar nor the prediction line appears:

Instead, as visible in the documentation, it should look like the picture here: http://contrib.scikit-learn.org/forest-confidence-interval/auto_examples/plot_mpg.html

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

送你一个梦 2025-02-17 02:11:58

您忘记添加此行，

plt.plot([5, 45], [5, 45], 'k--')

您的代码应该看起来像这样

plt.errorbar(mpg_y_test, mpg_y_hat, yerr=np.sqrt(mpg_V_IJ_unbiased), fmt='o')
plt.plot([5, 45], [5, 45], 'k--')
plt.xlabel('Reported MPG')
plt.ylabel('Predicted MPG')
plt.show()

You forget to add this line

plt.plot([5, 45], [5, 45], 'k--')

Your code should look like this

plt.errorbar(mpg_y_test, mpg_y_hat, yerr=np.sqrt(mpg_V_IJ_unbiased), fmt='o')
plt.plot([5, 45], [5, 45], 'k--')
plt.xlabel('Reported MPG')
plt.ylabel('Predicted MPG')
plt.show()

回复收藏 0 原文

~没有更多了~