R RandomForest:新对象的邻近度
我训练了一个随机森林:
model <- randomForest(x, y, proximity=TRUE)
当我想预测新对象的 y 时,我使用
y_pred <- predict(model, xnew)
如何根据现有的森林(模型)计算新对象(xnew)和训练集(x)之间的接近度? 预测函数中的邻近度选项仅给出新对象 (xnew) 之间的邻近度。我可以在组合数据集(x 和 xnew)上再次无监督地运行 randomForest 以获得近似值,但我认为必须有某种方法可以避免再次构建森林,而是使用已经存在的森林。
谢谢! 基利安
I trained a random forest:
model <- randomForest(x, y, proximity=TRUE)
When I want to predict y for new objects, I use
y_pred <- predict(model, xnew)
How can I calculate the proximity between the new objects (xnew) and the training set (x) based on the already existing forest (model)?
The proximity option in the predict function gives only the proxmities among the new objects (xnew). I could run randomForest unsupervised again on a combined data set (x and xnew) to get the proximities, but I think there must be some way to avoid building the forest again and instead using the already existing one.
Thanks!
Kilian
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我相信您想要的是在
randomForest
调用本身中指定您的测试观察结果,如下所示:这样,您就可以从 10 个测试用例到所有 150 个测试用例之间进行比较。
唯一的其他选择是我认为,对新案例调用
predict
rbind
到原始训练案例。但这样您就不需要在randomForest
调用中预先准备测试用例。在这种情况下,您需要在
randomForest
调用中使用keep.forest = TRUE
,当然在调用时设置proximity = TRUE
预测
。I believe what you want is to specify your test observations in the
randomForest
call itself, something like this:So that gives you the proximity from the ten test cases to all 150.
The only other option would be to call
predict
on your new caserbind
ed to the original training cases, I think. But that way you don't need to have your test cases up front with therandomForest
call.In that case, you'll want to use
keep.forest = TRUE
in therandomForest
call and of course setproximity = TRUE
when you callpredict
.