为随机森林回归创建错误条

发布于 2025-02-10 02:11:58 字数 2937 浏览 2 评论 0 原文

我是机器学习世界的新手,更普遍地是AI。 我正在使用Python和Jupyterlab分析包含不同房屋及其价格的数据集。

这是使用的数据集: httpps:

//www.kaggle.com/datasets/datasets/datasets/harlfoxem/harlfoxem/harlfoxem/houseslesprediction该数据集上的森林(Scikit-learn),现在我想绘制模型的误差条。 具体来说,我正在使用Forestci软件包,并将此代码完全应用于我的案例:

这是我的代码:

# Regression Forest Example
import pandas as pd
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn import linear_model
from sklearn import metrics
from sklearn.metrics import r2_score
import numpy as np
from matplotlib import pyplot as plt
from sklearn.ensemble import RandomForestRegressor
import sklearn.model_selection as xval
import forestci as fci

#import dataset
mpg_data = pd.read_csv(path_to_dataset)

#drop some useless features
mpg_data=mpg_data.drop('date', axis=1)
mpg_data=mpg_data.drop('yr_built', axis=1)
mpg_data = mpg_data.drop(["id"],axis=1)

#separate mpg data into predictors and outcome variable
mpg_X = mpg_data.drop(labels='price', axis=1)
mpg_y = mpg_data['price']

# remove rows where the data is nan
not_null_sel = np.where(mpg_X.isna().sum(axis=1).values == 0)
mpg_X = mpg_X.values[not_null_sel]
mpg_y = mpg_y.values[not_null_sel]

# split mpg data into training and test set
mpg_X_train, mpg_X_test, mpg_y_train, mpg_y_test = xval.train_test_split(
    mpg_X,
    mpg_y,
    test_size=0.25,
    random_state=42)

# Create RandomForestRegressor
mpg_forest = RandomForestRegressor(random_state=42)
mpg_forest.fit(mpg_X_train, mpg_y_train)
mpg_y_hat = mpg_forest.predict(mpg_X_test)

# Plot predicted MPG without error bars
plt.scatter(mpg_y_test, mpg_y_hat)
plt.xlabel('Reported MPG')
plt.ylabel('Predicted MPG')
plt.show()

print(r2_score(mpg_y_test, mpg_y_hat))

# Calculate the variance
mpg_V_IJ_unbiased = fci.random_forest_error(mpg_forest, mpg_X_train,
                                            mpg_X_test)

# Plot error bars for predicted MPG using unbiased variance
plt.errorbar(mpg_y_test, mpg_y_hat, yerr=np.sqrt(mpg_V_IJ_unbiased), fmt='o')
plt.xlabel('Reported MPG')
plt.ylabel('Predicted MPG')
plt.show()

它似乎有效,但是当绘制图表时,错误栏和预测行都不会出现:

“

相反,如文档中可见的那样,它应该看起来像图片: http://contrib.scikit-learn.org/forest-confidence-interval/auto_examples/plot_mpg.html

I'm new to the world of machine learning and more generally to AI.
I am analyzing a dataset containing characteristics of different houses and their prices using Python and JupyterLab.

Here is the dataset in use:
https://www.kaggle.com/datasets/harlfoxem/housesalesprediction

I applied random forest (scikit-learn) on this dataset and now I would like to plot the error bars of the model.
Specifically, I'm using the ForestCI package and applying exactly this code to my case:
http://contrib.scikit-learn.org/forest-confidence-interval/auto_examples/plot_mpg.html

This is my code:

# Regression Forest Example
import pandas as pd
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn import linear_model
from sklearn import metrics
from sklearn.metrics import r2_score
import numpy as np
from matplotlib import pyplot as plt
from sklearn.ensemble import RandomForestRegressor
import sklearn.model_selection as xval
import forestci as fci

#import dataset
mpg_data = pd.read_csv(path_to_dataset)

#drop some useless features
mpg_data=mpg_data.drop('date', axis=1)
mpg_data=mpg_data.drop('yr_built', axis=1)
mpg_data = mpg_data.drop(["id"],axis=1)

#separate mpg data into predictors and outcome variable
mpg_X = mpg_data.drop(labels='price', axis=1)
mpg_y = mpg_data['price']

# remove rows where the data is nan
not_null_sel = np.where(mpg_X.isna().sum(axis=1).values == 0)
mpg_X = mpg_X.values[not_null_sel]
mpg_y = mpg_y.values[not_null_sel]

# split mpg data into training and test set
mpg_X_train, mpg_X_test, mpg_y_train, mpg_y_test = xval.train_test_split(
    mpg_X,
    mpg_y,
    test_size=0.25,
    random_state=42)

# Create RandomForestRegressor
mpg_forest = RandomForestRegressor(random_state=42)
mpg_forest.fit(mpg_X_train, mpg_y_train)
mpg_y_hat = mpg_forest.predict(mpg_X_test)

# Plot predicted MPG without error bars
plt.scatter(mpg_y_test, mpg_y_hat)
plt.xlabel('Reported MPG')
plt.ylabel('Predicted MPG')
plt.show()

print(r2_score(mpg_y_test, mpg_y_hat))

# Calculate the variance
mpg_V_IJ_unbiased = fci.random_forest_error(mpg_forest, mpg_X_train,
                                            mpg_X_test)

# Plot error bars for predicted MPG using unbiased variance
plt.errorbar(mpg_y_test, mpg_y_hat, yerr=np.sqrt(mpg_V_IJ_unbiased), fmt='o')
plt.xlabel('Reported MPG')
plt.ylabel('Predicted MPG')
plt.show()

It seems to work but when the graphs are plotted, neither the error bar nor the prediction line appears:

the error bar and the prediction line not appears

Instead, as visible in the documentation, it should look like the picture here: http://contrib.scikit-learn.org/forest-confidence-interval/auto_examples/plot_mpg.html

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

送你一个梦 2025-02-17 02:11:58

您忘记添加此行,

plt.plot([5, 45], [5, 45], 'k--')

您的代码应该看起来像这样

plt.errorbar(mpg_y_test, mpg_y_hat, yerr=np.sqrt(mpg_V_IJ_unbiased), fmt='o')
plt.plot([5, 45], [5, 45], 'k--')
plt.xlabel('Reported MPG')
plt.ylabel('Predicted MPG')
plt.show()

You forget to add this line

plt.plot([5, 45], [5, 45], 'k--')

Your code should look like this

plt.errorbar(mpg_y_test, mpg_y_hat, yerr=np.sqrt(mpg_V_IJ_unbiased), fmt='o')
plt.plot([5, 45], [5, 45], 'k--')
plt.xlabel('Reported MPG')
plt.ylabel('Predicted MPG')
plt.show()
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文