使用CAR的多重共线性测试:: VIF

发布于 2025-01-20 19:02:38 字数 904 浏览 0 评论 0 原文

我正在尝试运行 car :: vif（）在R中测试，以测试多重共线性。但是，当我运行代码时

reg.model1 <- log(Price2) ~ Detached.house + Semi.detached.house + 
               Attached.houses + Apartment +
      Stock.apartment + Housing.cooperative + Sole.owner + Age +
      BRA + Bedrooms + Balcony + Lotsize + Sentrum + Alna + Vestre.Aker +
      Nordstrand + Marka + Ullern + Østensjø + Søndre.Nordstrand + Stovner + 
      Nordre.Aker + Bjerke + Grorud + Gamle.Oslo + St..Hanshaugen + 
      Grünerløkka + Sagene + Frogner 
reg1 <- lm(formula = reg.model1, data = Data)
vif(reg1)

，我会在控制台中收到此错误：

vif.default（reg1）中的错误：模型中有其他系数。

我读到的是，这意味着模型中有一些高度关联的东西。当我查看相关矩阵时，唯一高度关联的是因变量 price 。但是我也读到某个地方，即使因变量高度关联，依赖变量也可以。我还发现 bra 在相关性中为0.8，因此我尝试在没有此的情况下再次运行它，但仍然会遇到相同的错误。有人知道问题可能是什么，还是我可以尝试做些什么？

原文

I am trying to run a car::vif() test in R, to test for multicollinearity. However, when I run the code

reg.model1 <- log(Price2) ~ Detached.house + Semi.detached.house + 
               Attached.houses + Apartment +
      Stock.apartment + Housing.cooperative + Sole.owner + Age +
      BRA + Bedrooms + Balcony + Lotsize + Sentrum + Alna + Vestre.Aker +
      Nordstrand + Marka + Ullern + Østensjø + Søndre.Nordstrand + Stovner + 
      Nordre.Aker + Bjerke + Grorud + Gamle.Oslo + St..Hanshaugen + 
      Grünerløkka + Sagene + Frogner 
reg1 <- lm(formula = reg.model1, data = Data)
vif(reg1)

I receive this error in the console:

Error in vif.default(reg1) : there are aliased coefficients in the model.

What I have read is that this means that there is something in the model that is highly correlated. When I look at the correlation matrix the only thing that is highly correlated is the dependent variable Price. But I also read somewhere that the dependent variable is okay even if it's highly correlated. I also found out that BRA is 0.8 in correlation so I tried to run it again without this, and still get the same error. Does anyone know what the problem could be, or what I could try to do differently?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

别忘他 2025-01-27 19:02:38

这告诉您某些预测变量集完全（多重）共线；如果您查看 coef(reg1)，您将看到至少一个 NA 值，如果您运行 summary(lm)，您将看到信息

（[n] 由于奇点而未定义）

（对于某些 n>=1）。检查预测变量的成对相关性是不够的，因为如果您有（例如）预测变量 A、B、C，其中没有任何成对相关性（的绝对值）恰好为 1，它们仍然可以是 multi< /em>共线。（最常见的情况可能是 A、B、C 是虚拟变量，它们描述一组互斥且完整的可能性 [即，对于每个观察，A、B、C 中恰好有一个为 1，另外两个为 0]。I强烈怀疑这就是你最后 16 个左右的变量所发生的情况，这些变量似乎是奥斯陆的行政区......）

检查回归的哪些系数是NA（正如@Axeman建议的那样））可以提出问题所在；
这个答案解释了如何使用model.matrix()和caret::findLinearCombos来准确找出导致问题的预测变量集。（如果所有预测变量都是简单的数值变量，您可以跳过 model.matrix()。）

如果您的问题确实是由于为每个可能的地理区域包含虚拟变量而引起的，那么最简单/最好的解决方案是将地理区域（行政区）作为一个因素包含在模型中：如果您这样做，R 将自动生成一组虚拟对象/对比，但它会自动留下一个虚拟对象 em> 以避免这种情况问题。如果您稍后想要返回并获取每个行政区的预测值，您可以使用 emmeans 或 effects 包中的工具。

回复收藏 0 原文

无戏配角 2025-01-27 19:02:38

我四处搜索解决方案，因为我无法根据答案解决这些解决方案。但是，答案帮助我更好地理解了我的问题。解决我的问题的解决方案很简单，即为一个虚拟变量之一放下负而而不是加上加号。这最初是我之前发布的代码：

reg.model1 <- log(Price2) ~ Detached.house + Semi.detached.house + 
               Attached.houses + Apartment +
      Stock.apartment + Housing.cooperative + Sole.owner + Age +
      BRA + Bedrooms + Balcony + Lotsize + Sentrum + Alna + Vestre.Aker +
      Nordstrand + Marka + Ullern + Østensjø + Søndre.Nordstrand + Stovner + 
      Nordre.Aker + Bjerke + Grorud + Gamle.Oslo + St..Hanshaugen + 
      Grünerløkka + Sagene + Frogner 
reg1 <- lm(formula = reg.model1, data = Data)
vif(reg1)

要解决我的问题，我必须简单地将代码更改为：

reg.model1 <- log(Price2) ~ Detached.house + Semi.detached.house + 
               Attached.houses - Apartment +
      Stock.apartment + Housing.cooperative - Sole.owner + Age +
      BRA + Bedrooms + Balcony + Lotsize + Sentrum + Alna + Vestre.Aker +
      Nordstrand + Marka + Ullern + Østensjø + Søndre.Nordstrand + Stovner + 
      Nordre.Aker + Bjerke + Grorud + Gamle.Oslo + St..Hanshaugen + 
      Grünerløkka + Sagene - Frogner 
reg1 <- lm(formula = reg.model1, data = Data)
vif(reg1)

您可以看到我有3个系列假人，并且要确保不会发生多重共线性，我必须从中删除一个假人每个。我拆除了房屋类型的公寓，一种所有权类型的唯一所有者以及该地区的Frogner。本网站解释了这个问题和解决方案比我更好，更简单（）！

I searched around for solutions since I couldn't solve them based on the answers. The answers, however, helped me understand my problem better. The solution to my problem was as simple as to put a minus instead of plus for one of the dummy variables. This was originally my code as I posted earlier:

reg.model1 <- log(Price2) ~ Detached.house + Semi.detached.house + 
               Attached.houses + Apartment +
      Stock.apartment + Housing.cooperative + Sole.owner + Age +
      BRA + Bedrooms + Balcony + Lotsize + Sentrum + Alna + Vestre.Aker +
      Nordstrand + Marka + Ullern + Østensjø + Søndre.Nordstrand + Stovner + 
      Nordre.Aker + Bjerke + Grorud + Gamle.Oslo + St..Hanshaugen + 
      Grünerløkka + Sagene + Frogner 
reg1 <- lm(formula = reg.model1, data = Data)
vif(reg1)

To solve my issue i had to simply change my code to:

reg.model1 <- log(Price2) ~ Detached.house + Semi.detached.house + 
               Attached.houses - Apartment +
      Stock.apartment + Housing.cooperative - Sole.owner + Age +
      BRA + Bedrooms + Balcony + Lotsize + Sentrum + Alna + Vestre.Aker +
      Nordstrand + Marka + Ullern + Østensjø + Søndre.Nordstrand + Stovner + 
      Nordre.Aker + Bjerke + Grorud + Gamle.Oslo + St..Hanshaugen + 
      Grünerløkka + Sagene - Frogner 
reg1 <- lm(formula = reg.model1, data = Data)
vif(reg1)

As you can see I have 3 series of dummies, and to make sure multicollinearity doesn't occur I have to remove one dummy from each one. I have removed apartments for the type of home, sole owner for a type of ownership, and Frogner for the district. This website explained this problem and solution much better and simpler than I (https://www.learndatasci.com/glossary/dummy-variable-trap/)!

回复收藏 0 原文

~没有更多了~