使用连续数值目标类可视化回归树模型?

发布于 2025-02-01 04:44:31 字数 3746 浏览 1 评论 0原文

我正在Kaggle中练习这一终生的预期数据集( https://www.kaggle.com/datasets/kumarajarshi/kumarajarshi/life-expectancy-who?select = life+expect+data.csv ),我想训练和可视化分类和回归树模型。但是,我一直遇到一个错误,上面写着“ InvocationException:找不到GraphViz的可执行文件”。我想知道这是否是因为连续数值目标数据集类型的性质?如何可视化模型?

代码:

import warnings
warnings.filterwarnings('ignore') 

import pandas as pd
import numpy as np
import seaborn as sn
from sklearn import datasets
from sklearn import metrics
from sklearn import tree
from sklearn.tree import DecisionTreeClassifier
from sklearn.tree import DecisionTreeRegressor
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt
from sklearn.preprocessing import LabelEncoder
from sklearn.tree import export_graphviz
import matplotlib.pyplot as plt,pydotplus
from IPython.display import Image,display

data = pd.read_csv('Life Expectancy Data.csv')
data = data.dropna(how = 'any')

#feature selection
data = data.drop(columns=['infant deaths', ' thinness 5-9 years', 'Alcohol', 'percentage expenditure', 'Hepatitis B', 'Total expenditure', 'Population', ' thinness 5-9 years', 'Year', 'Country'])

# Creating a instance of label Encoder.
le = LabelEncoder()

# Using .fit_transform function to fit label
# encoder and return encoded label
label = le.fit_transform(data['Status'])

# removing the column 'Status' from df
data.drop('Status', axis=1, inplace=True)

# Appending the array to our dataFrame
# with column name 'Status'
data['Status'] = label

#training model
model_data = data
X = data.drop(columns=['Life expectancy '])
y = data['Life expectancy ']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2)

model = DecisionTreeRegressor()
model.fit(X_train, y_train)

#visualizing tree
LEtree = tree.export_graphviz(model, 
                feature_names = ['Adult Mortality', 'Measles', ' BMI', 'under-five deaths', 'Polio', 'Diphtheria', ' HIV/AIDS', 'GDP', ' thinness  1-19 years', 'Income composition of resources', 'Schooling', 'Status'],
               class_names = y,
               label = 'all',
               rounded = True,
               filled = True)

graph=pydotplus.graph_from_dot_data(LEtree)
display(Image(graph.create_png()))

完整错误消息:

InvocationException                       Traceback (most recent call last)
Input In [27], in <cell line: 2>()
      1 graph=pydotplus.graph_from_dot_data(LEtree)
----> 2 display(Image(graph.create_png()))

File ~\Anaconda3\lib\site-packages\pydotplus\graphviz.py:1797, in Dot.__init__.<locals>.<lambda>(f, prog)
   1792 # Automatically creates all the methods enabling the creation
   1793 # of output in any of the supported formats.
   1794 for frmt in self.formats:
   1795     self.__setattr__(
   1796         'create_' + frmt,
-> 1797         lambda f=frmt, prog=self.prog: self.create(format=f, prog=prog)
   1798     )
   1799     f = self.__dict__['create_' + frmt]
   1800     f.__doc__ = (
   1801         '''Refer to the docstring accompanying the'''
   1802         ''''create' method for more information.'''
   1803     )

File ~\Anaconda3\lib\site-packages\pydotplus\graphviz.py:1959, in Dot.create(self, prog, format)
   1957     self.progs = find_graphviz()
   1958     if self.progs is None:
-> 1959         raise InvocationException(
   1960             'GraphViz\'s executables not found')
   1962 if prog not in self.progs:
   1963     raise InvocationException(
   1964         'GraphViz\'s executable "%s" not found' % prog)

InvocationException: GraphViz's executables not found

I am practicing with this life expectancy dataset from Kaggle (https://www.kaggle.com/datasets/kumarajarshi/life-expectancy-who?select=Life+Expectancy+Data.csv) and I want to train and visualize a classification and regression tree model. however, I keep getting an error that says "InvocationException: GraphViz's executables not found". I am wondering if this is because of the nature of the continuous numerical target dataset type? how can I visualize the model?

code:

import warnings
warnings.filterwarnings('ignore') 

import pandas as pd
import numpy as np
import seaborn as sn
from sklearn import datasets
from sklearn import metrics
from sklearn import tree
from sklearn.tree import DecisionTreeClassifier
from sklearn.tree import DecisionTreeRegressor
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt
from sklearn.preprocessing import LabelEncoder
from sklearn.tree import export_graphviz
import matplotlib.pyplot as plt,pydotplus
from IPython.display import Image,display

data = pd.read_csv('Life Expectancy Data.csv')
data = data.dropna(how = 'any')

#feature selection
data = data.drop(columns=['infant deaths', ' thinness 5-9 years', 'Alcohol', 'percentage expenditure', 'Hepatitis B', 'Total expenditure', 'Population', ' thinness 5-9 years', 'Year', 'Country'])

# Creating a instance of label Encoder.
le = LabelEncoder()

# Using .fit_transform function to fit label
# encoder and return encoded label
label = le.fit_transform(data['Status'])

# removing the column 'Status' from df
data.drop('Status', axis=1, inplace=True)

# Appending the array to our dataFrame
# with column name 'Status'
data['Status'] = label

#training model
model_data = data
X = data.drop(columns=['Life expectancy '])
y = data['Life expectancy ']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2)

model = DecisionTreeRegressor()
model.fit(X_train, y_train)

#visualizing tree
LEtree = tree.export_graphviz(model, 
                feature_names = ['Adult Mortality', 'Measles', ' BMI', 'under-five deaths', 'Polio', 'Diphtheria', ' HIV/AIDS', 'GDP', ' thinness  1-19 years', 'Income composition of resources', 'Schooling', 'Status'],
               class_names = y,
               label = 'all',
               rounded = True,
               filled = True)

graph=pydotplus.graph_from_dot_data(LEtree)
display(Image(graph.create_png()))

full error message:

InvocationException                       Traceback (most recent call last)
Input In [27], in <cell line: 2>()
      1 graph=pydotplus.graph_from_dot_data(LEtree)
----> 2 display(Image(graph.create_png()))

File ~\Anaconda3\lib\site-packages\pydotplus\graphviz.py:1797, in Dot.__init__.<locals>.<lambda>(f, prog)
   1792 # Automatically creates all the methods enabling the creation
   1793 # of output in any of the supported formats.
   1794 for frmt in self.formats:
   1795     self.__setattr__(
   1796         'create_' + frmt,
-> 1797         lambda f=frmt, prog=self.prog: self.create(format=f, prog=prog)
   1798     )
   1799     f = self.__dict__['create_' + frmt]
   1800     f.__doc__ = (
   1801         '''Refer to the docstring accompanying the'''
   1802         ''''create' method for more information.'''
   1803     )

File ~\Anaconda3\lib\site-packages\pydotplus\graphviz.py:1959, in Dot.create(self, prog, format)
   1957     self.progs = find_graphviz()
   1958     if self.progs is None:
-> 1959         raise InvocationException(
   1960             'GraphViz\'s executables not found')
   1962 if prog not in self.progs:
   1963     raise InvocationException(
   1964         'GraphViz\'s executable "%s" not found' % prog)

InvocationException: GraphViz's executables not found

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

眼泪淡了忧伤 2025-02-08 04:44:31

尝试在适当的目录中安装GraphViz,

您可以使用以下命令在conda-command-prompt中安装Anaconda -

conda install -c conda-forge python-graphviz

并替换先前安装的graphviz目录,这可能有助于您解决问题

Try Installing the Graphviz in a proper directory

you can install in Anaconda from conda-command-prompt using the below command -

conda install -c conda-forge python-graphviz

and replace the previously installed graphviz directory this might help you with the problem

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文