使用dtreeviz可视化决策树分类时的路径错误
我正在尝试使用github中的代码可视化我的决策树分类,以下链接 https://github.com/parrt/dtreeviz/blob/master/notebooks/dtreeviz_spark_visalisation.ipynb 当我实施代码时:
df = spark.read.parquet("../../dtreeviz/testing/testlib/models/fixtures/spark_3_0_decision_tree_classifier.model/training_df")
我会收到以下错误:
AnalysisException Traceback (most
recent call last)
~\AppData\Local\Temp/ipykernel_12920/640132816.py in <module>
----> 1 df = spark.read.parquet("../../dtreeviz/testing/testlib/models/fixtures/spark_3_0_decision_tree_classifier.model/training_df")
C:\spark\spark-3.2.1-bin-hadoop2.7\python\pyspark\sql\readwriter.py in parquet(self, *paths,
**options)
299 int96RebaseMode=int96RebaseMode)
300
--> 301 return self._df(self._jreader.parquet(_to_seq(self._spark._sc, paths)))
302
303 def text(self, paths, wholetext=False, lineSep=None, pathGlobFilter=None,
C:\spark\spark-3.2.1-bin-hadoop2.7\python\lib\py4j-0.10.9.3-src.zip\py4j\java_gateway.py in
__call__(self, *args)
1319
1320 answer = self.gateway_client.send_command(command)
-> 1321 return_value = get_return_value(
1322 answer, self.gateway_client, self.target_id, self.name)
1323
C:\spark\spark-3.2.1-bin-hadoop2.7\python\pyspark\sql\utils.py in deco(*a, **kw)
115 # Hide where the exception came from that shows a non-Pythonic
116 # JVM exception message.
--> 117 raise converted from None
118 else:
119 raise
AnalysisException: Path does not exist: file:/C:/Users/dtreeviz/testing/testlib/models/fixtures/spark_3_0_decision_tree_classifier.model/t
raining_df
我遵循此链接中的所有指令 https:// github.com/parrt/dtreeviz
我找不到本地计算机中的路径,因为我不熟悉Parquet格式,我对代码的作用感到困惑,它看起来像是一条路径,但是有什么.model 参考?
I am trying to visualise my decision tree classification using the code in GitHub in the following link https://github.com/parrt/dtreeviz/blob/master/notebooks/dtreeviz_spark_visualisations.ipynb
when I am implementing the code:
df = spark.read.parquet("../../dtreeviz/testing/testlib/models/fixtures/spark_3_0_decision_tree_classifier.model/training_df")
I am getting the following error:
AnalysisException Traceback (most
recent call last)
~\AppData\Local\Temp/ipykernel_12920/640132816.py in <module>
----> 1 df = spark.read.parquet("../../dtreeviz/testing/testlib/models/fixtures/spark_3_0_decision_tree_classifier.model/training_df")
C:\spark\spark-3.2.1-bin-hadoop2.7\python\pyspark\sql\readwriter.py in parquet(self, *paths,
**options)
299 int96RebaseMode=int96RebaseMode)
300
--> 301 return self._df(self._jreader.parquet(_to_seq(self._spark._sc, paths)))
302
303 def text(self, paths, wholetext=False, lineSep=None, pathGlobFilter=None,
C:\spark\spark-3.2.1-bin-hadoop2.7\python\lib\py4j-0.10.9.3-src.zip\py4j\java_gateway.py in
__call__(self, *args)
1319
1320 answer = self.gateway_client.send_command(command)
-> 1321 return_value = get_return_value(
1322 answer, self.gateway_client, self.target_id, self.name)
1323
C:\spark\spark-3.2.1-bin-hadoop2.7\python\pyspark\sql\utils.py in deco(*a, **kw)
115 # Hide where the exception came from that shows a non-Pythonic
116 # JVM exception message.
--> 117 raise converted from None
118 else:
119 raise
AnalysisException: Path does not exist: file:/C:/Users/dtreeviz/testing/testlib/models/fixtures/spark_3_0_decision_tree_classifier.model/t
raining_df
I followed all the instructions in this link https://github.com/parrt/dtreeviz
I couldn't find the path in my local machine I am confused about what the code does as I am not familiar with Parquet format, it looks like a path but what does .model refer to?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我看笔记本。确实,它包含一些用于开发/测试的不必要的代码。
在您的情况下,实际可视化不需要“ DF”数据框。您可以对其进行评论,可视化应该起作用。
I took a look on the notebook. Indeed, it contains some unnecessary code which was used for development/testing.
In you case, the 'df' dataframe is not needed for actual visualisations. You can comment it and the visualisations should work.