R 多部分公式在数学术语中意味着什么?
在 R Formula 包中,它引入了多部分公式的概念,例如 y ~ x1 + x2|I(x1^2) 。 这个公式的数学意义是什么?这与 y ~ x1 + x2 + I(x1^2) 或两个独立的 y ~ x1 + x2 和 y ~ I(x1^2) 有什么不同?
In R Formula package, it introduces notions for multipart formula like y ~ x1 + x2|I(x1^2)
.
What's this formula mean mathematically? How's this different from y ~ x1 + x2 + I(x1^2)
or two independent y ~ x1 + x2
and y ~ I(x1^2)
?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
您似乎误解了 Formula 包的用途。多部分公式可用于表示用户/开发人员希望它们表示的任何含义。 Formula 围绕包提供的更灵活的公式符号提供语法糖。在您处理公式以将符号表示形式转换为模型矩阵或类似矩阵之前,多部分公式没有任何意义。
您在后续“答案”中引用的示例是 y ~ x1 + X2 | z1 + z2 + z3 。这是针对由两阶段 OLS 拟合的工具变量模型。
|
(z1 +z2 + z3
) 之后的部分由ivcoef()
函数解释为 IV,而|
的左侧 (x1 + x2
) 被解释为回归协变量。ivcoef()
根据公式 RHS 的这些部分构建两个模型矩阵,使其能够拟合两阶段 OLS。 公式提供了处理和操作这些多部分公式的代码,它没有指定它们用于表示什么统计模型。另一个例子是 pscl 包中的
hurdle()
函数,它使用公式功能。在这些模型中,相同的公式 y ~ x1 + X2 | z1 +z2 + z3 会有不同的解释;即z1 +z2 + z3
位将用于零障碍(障碍模型的二项式部分),而x1 + X2
将被解释并用于障碍模型的计数部分。我的观点是,如果您正在构建软件,则可以根据您的意愿解释该公式。如果您是用户,则需要先了解所拟合的模型,然后才能根据统计模型解释多部分公式。因此,您的问题没有答案;对于多部分公式,在数学术语中没有一个含义。
You seem to misunderstand what the Formula package is for. The multipart formulas can be used to mean whatever you as the user/developer want them to mean. Formula provides the syntactic sugar around the more flexible formula notation provisioned by the package. The multipart formulas don't mean anything until you process the formula to convert the symbolic representation into model matrices or similar.
The example you quote in your follow-on "Answer" is
y ~ x1 + X2 | z1 +z2 + z3
. This is for an instrumental variables model fitted by two-stage OLS. The part after the|
(z1 +z2 + z3
) is then interpreted by theivcoef()
function as the IVs, whilst the part to the left of the|
(x1 + x2
) is interpreted as the regression covariates.ivcoef()
builds two model matrices from these parts of the RHS of the formula to enable it to fit the two-stage OLS. Formula provides the code to handle and manipulate these multipart formulas, it doesn't specify what statistical models they are used to represent.Another example is the
hurdle()
function in package pscl, which uses the Formula functionality. In these models, the same formulay ~ x1 + X2 | z1 +z2 + z3
would be interpreted differently; namely thez1 +z2 + z3
bit would be used for the zero hurdle (the binomial part of the hurdle model), whilst thex1 + X2
would be interpreted and used for the count part of the hurdle model.My point is, the Formula can be interpreted however you wish if you are building the software. If you are the user, you need to understand the model being fitted before you can interpret the multipart Formula in terms of the statistical model. As such there isn't an answer to your Q; there is no one meaning in mathematical terms for a multipart Formula.