R RandomForest：新对象的邻近度

发布于 2024-12-21 17:29:36 字数 356 浏览 2 评论 0原文

我训练了一个随机森林：

model <- randomForest(x, y, proximity=TRUE)

当我想预测新对象的 y 时，我使用

y_pred <- predict(model, xnew)

如何根据现有的森林（模型）计算新对象（xnew）和训练集（x）之间的接近度？预测函数中的邻近度选项仅给出新对象 (xnew) 之间的邻近度。我可以在组合数据集（x 和 xnew）上再次无监督地运行 randomForest 以获得近似值，但我认为必须有某种方法可以避免再次构建森林，而是使用已经存在的森林。

谢谢！基利安

原文

I trained a random forest:

model <- randomForest(x, y, proximity=TRUE)

When I want to predict y for new objects, I use

y_pred <- predict(model, xnew)

How can I calculate the proximity between the new objects (xnew) and the training set (x) based on the already existing forest (model)?
The proximity option in the predict function gives only the proxmities among the new objects (xnew). I could run randomForest unsupervised again on a combined data set (x and xnew) to get the proximities, but I think there must be some way to avoid building the forest again and instead using the already existing one.

Thanks!
Kilian

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

浅听莫相离 2024-12-28 17:29:36

我相信您想要的是在 randomForest 调用本身中指定您的测试观察结果，如下所示：

set.seed(71)
ind <- sample(1:150,140,replace = FALSE)
train <- iris[ind,]
test <- iris[-ind,]

iris.rf1 <- randomForest(x = train[,1:4],
                         y = train[,5],
                         xtest = test[,1:4],
                         ytest = test[,5], 
                         importance=TRUE,
                         proximity=TRUE)

dim(iris.rf1$test$prox)
[1]  10 150

这样，您就可以从 10 个测试用例到所有 150 个测试用例之间进行比较。

唯一的其他选择是我认为，对新案例调用 predict rbind 到原始训练案例。但这样您就不需要在 randomForest 调用中预先准备测试用例。

在这种情况下，您需要在 randomForest 调用中使用 keep.forest = TRUE，当然在调用时设置 proximity = TRUE 预测。

I believe what you want is to specify your test observations in the randomForest call itself, something like this:

set.seed(71)
ind <- sample(1:150,140,replace = FALSE)
train <- iris[ind,]
test <- iris[-ind,]

iris.rf1 <- randomForest(x = train[,1:4],
                         y = train[,5],
                         xtest = test[,1:4],
                         ytest = test[,5], 
                         importance=TRUE,
                         proximity=TRUE)

dim(iris.rf1$test$prox)
[1]  10 150

So that gives you the proximity from the ten test cases to all 150.

The only other option would be to call predict on your new case rbinded to the original training cases, I think. But that way you don't need to have your test cases up front with the randomForest call.

In that case, you'll want to use keep.forest = TRUE in the randomForest call and of course set proximity = TRUE when you call predict.

回复收藏 0 原文

~没有更多了~