当我执行 lm 函数时，如何忽略 NA 数据？

发布于 2024-10-03 20:30:31 字数 460 浏览 10 评论 0原文

我的问题很简单，但尝试了很多方法后都无法解决。

我有两个数据框。

>a
   col1 col2 col3 col4
1    1    2    1    4
2    2   NA    2    3    
3    3    2    3    2    
4    4    3    4    1

> b
  col1 col2 col3 col4
1    5    2    1    4    
2    2   NA    2    3    
3    3   NA    3    2    
4    4    3    4    1

我可以使用 lm(a ~ b) 来拟合 a 和 b 中的数据吗？

如果这样做，如何忽略 NA 数据？

谢谢，丹

原文

My question is rather simple, but I could not get it resolved after trying a lot of things.

I have two data frames.

>a
   col1 col2 col3 col4
1    1    2    1    4
2    2   NA    2    3    
3    3    2    3    2    
4    4    3    4    1

> b
  col1 col2 col3 col4
1    5    2    1    4    
2    2   NA    2    3    
3    3   NA    3    2    
4    4    3    4    1

Can I do a lm(a ~ b) to fit the data in a and b?

If I do, how do I ignore the NA data?

Thanks, Dan

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

眼中杀气 2024-10-10 20:30:31

通常，R 中的回归函数只会报告完整案例的结果，因此您通常不需要执行任何特殊操作来保留案例。您的问题似乎有点模糊，并且不清楚为什么您要将整个矩阵（或者是一个 data.frame？）放在公式的左侧。可以使用 lm() 函数进行多变量分析，但想要这样做的人通常会提出更具体的问题。

> lm(a$col1 ~ b$col1+b$col2 +b$col3+b$col4)

Call:
lm(formula = a$col1 ~ b$col1 + b$col2 + b$col3 + b$col4)

Coefficients:
(Intercept)       b$col1       b$col2       b$col3       b$col4  
         16           -3           NA           NA           NA

在丢失 2 例且只剩下 2 例后，数据量极小，无法进行进一步的估计。

Generally the regression functions in R will only report the results from complete cases, so you do not usually need to do anything special to hold out cases. Your question seems a bit vague, and it is not clear why you are putting an entire matrix (or is that a data.frame?) on the left-hand side of a formula. There is the capability of doing multi-variate analyses with the lm() function, but people who want to do so will generally ask more specific questions.

> lm(a$col1 ~ b$col1+b$col2 +b$col3+b$col4)

Call:
lm(formula = a$col1 ~ b$col1 + b$col2 + b$col3 + b$col4)

Coefficients:
(Intercept)       b$col1       b$col2       b$col3       b$col4  
         16           -3           NA           NA           NA

The tiny amount of data prevents any further estimates after losing 2 cases and only having two left.

回复收藏 0 原文

不羁少年 2024-10-10 20:30:31

如果 a 和 b 是数据框，并且您想要将 a 中的各个值与 b 中的值进行回归，则需要将它们转换为向量。例如：

> lm(as.vector(as.matrix(a))~as.vector(as.matrix(b)))

Call:
lm(formula = as.vector(as.matrix(a)) ~ as.vector(as.matrix(b)))

Coefficients:
            (Intercept)  as.vector(as.matrix(b))  
               8.418239                -0.005241

默认情况下会删除丢失的数据 - 请参阅 help(lm) 和 na.action 参数。 lm 对象上的摘要方法将告诉您有关删除的观察结果的信息。

当然，忽略空间数据中可能存在的空间相关性意味着您从参数估计中得出的推论将是完全错误的。绘制残差图。并阅读一本关于空间统计的好书...

[编辑：哦，数据框必须全部是数字，或者全部转换为字符，然后...好吧，谁知道...]

编辑：

另一种方式从数据帧获取向量只是使用“unlist”：

> a=data.frame(matrix(runif(16),4,4))
> b=data.frame(matrix(runif(16),4,4))
> lm(a~b)
Error in model.frame.default(formula = a ~ b, drop.unused.levels = TRUE) : 
  invalid type (list) for variable 'a'
> lm(unlist(a)~unlist(b))

Call:
lm(formula = unlist(a) ~ unlist(b))

Coefficients:
(Intercept)    unlist(b)  
     0.6488      -0.3137

我以前没有见过 data.matrix，谢谢 Gavin。

If a and b are data frames, and you want to regress the individual values in a on the values in b, then you need to convert them to vectors. eg:

> lm(as.vector(as.matrix(a))~as.vector(as.matrix(b)))

Call:
lm(formula = as.vector(as.matrix(a)) ~ as.vector(as.matrix(b)))

Coefficients:
            (Intercept)  as.vector(as.matrix(b))  
               8.418239                -0.005241

Missing data is by default dropped - see help(lm) and the na.action parameter. The summary method on an lm object will tell you about dropped observations.

Of course ignoring the spatial correlation likely to be present in spatial data will mean your inferences from the parameter estimates will be quite wrong. Map the residuals. And read a good book on spatial stats...

[Edit: oh, and the data frames have to be all numbers or the whole lot gets converted to characters and then... well, who knows...]

Edit:

Another way of getting vectors from data frames is just to use 'unlist':

> a=data.frame(matrix(runif(16),4,4))
> b=data.frame(matrix(runif(16),4,4))
> lm(a~b)
Error in model.frame.default(formula = a ~ b, drop.unused.levels = TRUE) : 
  invalid type (list) for variable 'a'
> lm(unlist(a)~unlist(b))

Call:
lm(formula = unlist(a) ~ unlist(b))

Coefficients:
(Intercept)    unlist(b)  
     0.6488      -0.3137

I've not seen data.matrix before, thx Gavin.

回复收藏 0 原文

~没有更多了~