使用 rpart 在回归树中搜索相应节点

发布于 2024-10-19 00:38:07 字数 1234 浏览 6 评论 0原文

我对 R 还很陌生,并且遇到了一个非常愚蠢的问题。

我正在使用 rpart 包校准回归树,以便进行一些分类和一些预测。

借助 R,校准部分很容易完成且易于控制。

#the package rpart is needed
library(rpart)

# Loading of a big data file used for calibration
my_data <- read.csv("my_file.csv", sep=",", header=TRUE)

# Regression tree calibration
tree <- rpart(Ratio ~ Attribute1 + Attribute2 + Attribute3 + 
                      Attribute4 + Attribute5, 
                      method="anova", data=my_data, 
                      control=rpart.control(minsplit=100, cp=0.0001))

校准大决策树后,我希望对于给定的数据样本找到一些新数据的相应集群(从而找到预测值)。
predict 函数似乎非常适合该需求。

# read validation data
validationData <-read.csv("my_sample.csv", sep=",", header=TRUE)

# search for the probability in the tree
predict <- predict(tree, newdata=validationData, class="prob")

# dump them in a file
write.table(predict, file="dump.txt") 

但是,使用 predict 方法,我只能获得新元素的预测比率,而无法找到获取新元素所属的决策树叶的方法。

我认为它应该很容易得到,因为预测方法必须找到该叶子才能返回比率。

,所有参数似乎都返回相同的东西(决策树的目标属性的值)

有几个参数可以通过 class= 参数提供给预测方法,但是对于回归树来说 有人知道如何获取决策树中的相应节点吗?

通过使用path.rpart方法分析节点,它将帮助我理解结果。

I'm pretty new to R and I'm stuck with a pretty dumb problem.

I'm calibrating a regression tree using the rpart package in order to do some classification and some forecasting.

Thanks to R the calibration part is easy to do and easy to control.

#the package rpart is needed
library(rpart)

# Loading of a big data file used for calibration
my_data <- read.csv("my_file.csv", sep=",", header=TRUE)

# Regression tree calibration
tree <- rpart(Ratio ~ Attribute1 + Attribute2 + Attribute3 + 
                      Attribute4 + Attribute5, 
                      method="anova", data=my_data, 
                      control=rpart.control(minsplit=100, cp=0.0001))

After having calibrated a big decision tree, I want, for a given data sample to find the corresponding cluster of some new data (and thus the forecasted value).
The predict function seems to be perfect for the need.

# read validation data
validationData <-read.csv("my_sample.csv", sep=",", header=TRUE)

# search for the probability in the tree
predict <- predict(tree, newdata=validationData, class="prob")

# dump them in a file
write.table(predict, file="dump.txt") 

However with the predict method I just get the forecasted ratio of my new elements, and I can't find a way get the decision tree leaf where my new elements belong.

I think it should be pretty easy to get since the predict method must have found that leaf in order to return the ratio.

There are several parameters that can be given to the predict method through the class= argument, but for a regression tree all seem to return the same thing (the value of the target attribute of the decision tree)

Does anyone know how to get the corresponding node in the decision tree?

By analyzing the node with the path.rpart method, it would help me understanding the results.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

捂风挽笑 2024-10-26 00:38:07

不幸的是,本杰明的答案不起作用: type="vector" 仍然返回预测值。

我的解决方案相当笨拙,但我认为没有更好的方法。诀窍是将模型框架中的预测 y 值替换为相应的节点编号。

tree2 = tree
tree2$frame$yval = as.numeric(rownames(tree2$frame))
predict = predict(tree2, newdata=validationData)

现在,predict 的输出将是节点编号,而不是预测的 y 值。

(注意:上面的方法在我的情况下有效,其中 tree 是回归树,而不是分类树。在分类树的情况下,您可能需要省略 as.numeric 或将其替换为 as.factor。)

Benjamin's answer unfortunately doesn't work: type="vector" still returns the predicted values.

My solution is pretty klugy, but I don't think there's a better way. The trick is to replace the predicted y values in the model frame with the corresponding node numbers.

tree2 = tree
tree2$frame$yval = as.numeric(rownames(tree2$frame))
predict = predict(tree2, newdata=validationData)

Now the output of predict will be node numbers as opposed to predicted y values.

(One note: the above worked in my case where tree was a regression tree, not a classification tree. In the case of a classification tree, you probably need to omit as.numeric or replace it with as.factor.)

物价感观 2024-10-26 00:38:07

您可以使用 partykit 包:

fit <- rpart(Kyphosis ~ Age + Number + Start, data = kyphosis)

library("partykit")
fit.party <- as.party(fit)
predict(fit.party, newdata = kyphosis[1:4, ], type = "node")

对于您的示例,只需设置

predict(as.party(tree), newdata = validationData, type = "node")

You can use the partykit package:

fit <- rpart(Kyphosis ~ Age + Number + Start, data = kyphosis)

library("partykit")
fit.party <- as.party(fit)
predict(fit.party, newdata = kyphosis[1:4, ], type = "node")

For your example just set

predict(as.party(tree), newdata = validationData, type = "node")
白龙吟 2024-10-26 00:38:07

我认为你想要的是 type="vector" 而不是 class="prob" (我不认为 class 是预测方法的可接受参数),因为rpart 文档中解释:

如果 type="vector":预测的向量
回应。对于回归树来说
是节点的平均响应,对于
泊松树是估计的
响应率和分类
树它是预测的类(作为
数)。

I think what you want is type="vector" instead of class="prob" (I don't think class is an accepted parameter of the predict method), as explained in the rpart docs:

If type="vector": vector of predicted
responses. For regression trees this
is the mean response at the node, for
Poisson trees it is the estimated
response rate, and for classification
trees it is the predicted class (as a
number).

平定天下 2024-10-26 00:38:07
  1. treeClust::rpart.predict.leaves(tree,validationData) 也返回节点号
  2. tree$其中返回训练集的节点号
  1. treeClust::rpart.predict.leaves(tree, validationData) returns node number
  2. also tree$where returns node numbers for the training set
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文