当变量与 SAS 中 proc 逻辑中的截距高度相关时,这意味着什么?
我从我的一份过程逻辑报告中发现,某个变量与截距高度相关。我该如何解释它?我应该改变什么来修正这种相关性?
编辑:尝试从更理论的角度提出这个问题。在大多数逻辑回归包的估计相关性分析输出中,如果您看到截距估计与某个变量高度相关,这意味着什么?遇到这样的情况你会如何处理?希望这是一种更清晰的提问方式。非常感谢大家。
I found from one of my proc logistic report that a certain variable is highly correlated with the intercept. How can I interpret it? What should I change to amend this correlation?
EDIT: Try to ask this question in a more theroetical point of view. In estimate correlation analysis output from most logistic regression package, what does it mean if you see the intercept estimate is highly correlated with a certain variable? How would you deal with such a situation? Hopefully this is a clearer way of asking the question. Thank you very much everyone.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
截距系数和协变量之间的正相关意味着大部分协变量值为负(反之亦然:正值会出现负相关)。
这不仅限于逻辑回归,并且通过线性回归可能更容易看出。将值的散点图视为 y 轴右侧的斑点,并绘制最佳拟合线性回归线。现在稍微增加它的 y 轴截距和斜率:如果“斑点”足够远,线将完全错过它。因此,您无法在获得合理拟合线的同时沿同一方向移动两个参数。换句话说,估计值是负相关的。
实际上,这没什么大不了的。确实,截距的估计会具有很高的可变性,但如果大部分数据远离 0,这并不奇怪。通常 x=0 没有意义,因此您甚至不关心截距。如果您无法忍受看到那些大的相关性,只需将 x 变量居中即可。 y 轴将移动到数据的中间,相关性将神奇地消失。当然,截距的含义也会发生变化,但这通常是可取的。
A positive correlation between the coefficient of the intercept and covariate means that the bulk of your covariate values are negative (or vice versa: negative correlation will be seen with positive values).
This is not restricted to logistic regression, and might be easier to see with linear regression. Think of the scatterplot of your values as a blob to the right of the y-axis, and draw the best fitting linear regression line. Now increase both its y-intercept and slope a bit: if the "blob" is far enough, the line will completely miss it. So you can't move both parameters in the same direction while getting a reasonably fitting line. In other words the estimates are negatively correlated.
In practice, this is not a big deal. It is true that the estimate of the intercept will have a high variability, but that is not surprising if the bulk of your data is away from 0. Often x=0 is not meaningful, so you don't even care about the intercept. If you just can't bear to see those large correlations, just center your x variable. The y-axis will move to the middle of your data, and the correlation will magically vanish. Of course, the meaning of the intercept changes as well, but that is often desirable.