model.matrix() 与 na.action=NULL?
我有一个公式和一个数据框,我想提取 model.matrix()
。但是,我需要生成的矩阵包含在原始数据集中找到的 NA。如果我要使用 model.frame() 来执行此操作,我只需传递 na.action=NULL 即可。但是,我需要的输出是 model.matrix() 格式。具体来说,我只需要右侧变量,我需要输出是一个矩阵(而不是数据框),并且我需要将因子转换为一系列虚拟变量。
我确信我可以使用循环或其他东西将某些东西组合在一起,但我想知道是否有人可以建议一种更干净、更有效的解决方法。非常感谢您抽出时间!
这是一个例子:
dat <- data.frame(matrix(rnorm(20),5,4), gl(5,2))
dat[3,5] <- NA
names(dat) <- c(letters[1:4], 'fact')
ff <- a ~ b + fact
# This omits the row with a missing observation on the factor
model.matrix(ff, dat)
# This keeps the NA, but it gives me a data frame and does not dichotomize the factor
model.frame(ff, dat, na.action=NULL)
这是我想要获得的:
(Intercept) b fact2 fact3 fact4 fact5
1 1 0.7266086 0 0 0 0
2 1 -0.6088697 0 0 0 0
3 NA 0.4643360 NA NA NA NA
4 1 -1.1666248 1 0 0 0
5 1 -0.7577394 0 1 0 0
6 1 0.7266086 0 1 0 0
7 1 -0.6088697 0 0 1 0
8 1 0.4643360 0 0 1 0
9 1 -1.1666248 0 0 0 1
10 1 -0.7577394 0 0 0 1
I have a formula and a data frame, and I want to extract the model.matrix()
. However, I need the resulting matrix to include the NAs that were found in the original dataset. If I were to use model.frame()
to do this, I would simply pass it na.action=NULL
. However, the output I need is of the model.matrix()
format. Specifically, I need only the right-hand side variables, I need the output to be a matrix (not a data frame), and I need factors to be converted to a series of dummy variables.
I'm sure I could hack something together using loops or something, but I was wondering if anyone could suggest a cleaner and more efficient workaround. Thanks a lot for your time!
And here's an example:
dat <- data.frame(matrix(rnorm(20),5,4), gl(5,2))
dat[3,5] <- NA
names(dat) <- c(letters[1:4], 'fact')
ff <- a ~ b + fact
# This omits the row with a missing observation on the factor
model.matrix(ff, dat)
# This keeps the NA, but it gives me a data frame and does not dichotomize the factor
model.frame(ff, dat, na.action=NULL)
Here is what I would like to obtain:
(Intercept) b fact2 fact3 fact4 fact5
1 1 0.7266086 0 0 0 0
2 1 -0.6088697 0 0 0 0
3 NA 0.4643360 NA NA NA NA
4 1 -1.1666248 1 0 0 0
5 1 -0.7577394 0 1 0 0
6 1 0.7266086 0 1 0 0
7 1 -0.6088697 0 0 1 0
8 1 0.4643360 0 0 1 0
9 1 -1.1666248 0 0 0 1
10 1 -0.7577394 0 0 0 1
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
Joris 的建议有效,但更快、更干净的方法是通过全局 na.action 设置。 “通过”选项实现了我们从原始数据集中保留 NA 的目标。
选项 1:通过
结果矩阵将在与原始数据集对应的行中包含 NA。
选项 2:省略
结果矩阵将跳过包含 NA 的行。
选项 3:失败
如果原始数据包含 NA,则会发生错误。
当然,更改全局选项时请务必小心,因为它们可能会改变代码其他部分的行为。谨慎的人可能会使用
current.na.action <- options('na.action')
之类的内容存储原始设置,然后在创建 model.matrix 后将其更改回来。Joris's suggestion works, but a quicker and cleaner way to do this is via the global na.action setting. The 'Pass' option achieves our goal of preserving NA's from the original dataset.
Option 1: Pass
Resulting matrix will contain NA's in rows corresponding to the original dataset.
Option 2: Omit
Resulting matrix will skip rows containing NA's.
Option 3: Fail
An error will occur if the original data contains NA's.
Of course, always be careful when changing global options because they can alter behavior of other parts of your code. A cautious person might store the original setting with something like
current.na.action <- options('na.action')
, and then change it back after making the model.matrix.另一种方法是使用带有参数
na.action=na.pass
的model.frame
函数作为model.matrix
的第二个参数:< code>model.frame 允许您为
na.action
设置适当的操作,该操作在调用model.matrix
时维护。Another way is to use the
model.frame
function with argumentna.action=na.pass
as your second argument tomodel.matrix
:model.frame
allows you to set the appropriate action forna.action
which is maintained whenmodel.matrix
is called.在查看 mattdevlin 和 Nathan Gould 的回答:
model.matrix.default
可能不支持na.action
参数,但是model.matrix.lm
确实如此!(我从 Rstudio 的自动完成建议中找到了 model.matrix.lm - 如果您尚未加载任何模型,它似乎是 model.matrix 的唯一非默认方法然后我猜测它可能支持
na.action
参数。)I half-stumbled across a simpler solution after looking at mattdevlin and Nathan Gould's answers:
model.matrix.default
may not support thena.action
argument, butmodel.matrix.lm
does!(I found
model.matrix.lm
from Rstudio's auto-complete suggestions — it appears to be the only non-default method formodel.matrix
if you haven't loaded any libraries that add others. Then I just guessed it might support thena.action
argument.)您可以根据 rownames : 使用
model.matrix
对象进行一些修改,它给出:
或者,您可以使用
contrasts()
来为您完成这项工作。手动构造矩阵将是:给出:
在任何情况下,这两种方法都可以合并到可以处理更复杂公式的函数中。我把练习留给读者(当我在论文中遇到这句话时,我讨厌什么;-))
You can mess around a little with the
model.matrix
object, based on the rownames :which gives :
Alternatively, you can use
contrasts()
to do the work for you. Constructing the matrix by hand would be :which gives :
In any case, both methods can be incorporated in a function that can deal with more complex formulae. I leave the exercise to the reader (what do I loath that sentence when I meet it in a paper ;-) )