如何在 R 中操作 GLM 系数?
如何操作 GLM 对象来绕过此错误?我希望预测将看不见的水平视为基本情况(即,给它们系数为零。)
> master <- data.frame(x = factor(floor(runif(100,0,3)), labels=c("A","B","C")), y = rnorm(100))
> part.1 <- master[master$x == 'C',]
> part.2 <- master[master$x == 'A' | master$x == 'B',]
> model.2 <- glm(y ~ x, data=part.2)
> predict.1 <- predict(model.2, part.1)
Error in model.frame.default(Terms, newdata, na.action = na.action, xlev = object$xlevels) : factor 'x' has new level(s) C
我尝试这样做:
> model.2$xlevels$x <- c(model.2$xlevels, "C")
> predict.1 <- predict(model.2, part.1)
但它没有正确对模型进行评分:
> predict.1[1:5]
2 3 6 8 10
0.03701494 0.03701494 0.03701494 0.03701494 0.03701494
> summary(model.2)
Call:
glm(formula = y ~ x, data = part.2)
<snip>
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.12743 0.18021 0.707 0.482
xB -0.09042 0.23149 -0.391 0.697
predict.1 应该仅为 0.12743。
这显然只是一个精简版本 - 我的真实模型中有 25 个左右的变量,因此 predict.1 <-rep(length(part.1), 0.12743)
的答案是对我没用。
感谢您的帮助!
How can I manipulate a GLM object in order to bypass this error? I would like for predict to treat the unseen levels as base cases (that is, give them a coefficient of zero.)
> master <- data.frame(x = factor(floor(runif(100,0,3)), labels=c("A","B","C")), y = rnorm(100))
> part.1 <- master[master$x == 'C',]
> part.2 <- master[master$x == 'A' | master$x == 'B',]
> model.2 <- glm(y ~ x, data=part.2)
> predict.1 <- predict(model.2, part.1)
Error in model.frame.default(Terms, newdata, na.action = na.action, xlev = object$xlevels) : factor 'x' has new level(s) C
I tried doing this:
> model.2$xlevels$x <- c(model.2$xlevels, "C")
> predict.1 <- predict(model.2, part.1)
But it's not scoring the model correctly:
> predict.1[1:5]
2 3 6 8 10
0.03701494 0.03701494 0.03701494 0.03701494 0.03701494
> summary(model.2)
Call:
glm(formula = y ~ x, data = part.2)
<snip>
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.12743 0.18021 0.707 0.482
xB -0.09042 0.23149 -0.391 0.697
predict.1 should only be 0.12743.
This is obviously just a trimmed down version--my real model has 25 or so variables in it, so an answer of predict.1 <- rep(length(part.1), 0.12743)
is not useful to me.
Thanks for any help!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
如果您知道 x=='C' 的观察结果与 x=='A' 的行为完全相同,那么您可以这样做:
这将为您提供纯截距模型。
If you know that observations where x=='C' behave exactly like x=='A', then you can just do:
which will give you your pure intercept model.
我不同意你应该期待任何预测。您开发的模型不包含 x 变量是值为“C”的因子的项目,因此您不应期望任何预测。您对 1:5 进行预测的努力也应该会失败。
I disagree that you should expect any prediction. You develop a model with no items whose x variable is a factor whose value is "C" so you should not expect any prediction. Your effort to produce predictions for 1:5 also should fail.