多元线性回归
我正在尝试使用 GLSMultipleLinearRegression (来自 apache commons-math 包)进行多元线性回归。它期望一个协方差矩阵作为输入——我不知道如何计算它们。我有 1 个因变量数组和 3 个自变量数组。
知道如何计算协方差矩阵吗?
注意:我的 3 个自变量各有 200 个项目,
谢谢
巴拉尼
I am trying to use GLSMultipleLinearRegression (from apache commons-math package) for multiple linear regression. It is expecting a covariance matrix as input -- I am not sure how to compute them. I have one array of dependent variables and 3 arrays of independent variables.
Any idea how to compute the covariance matrix?
Note: I have 200 items for each of the 3 independent variables
Thanks
Bharani
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
如果您不知道误差之间的协方差,则可以采用迭代方法。您将首先使用普通最小二乘法,计算误差以及误差之间的协方差。然后,您可以使用计算出的协方差矩阵应用 GLS 并重新估计协方差矩阵。您将继续使用 GLS 和新协方差矩阵进行迭代,直到收敛。 这里有一个链接(.pdf 警告)此方法的示例以及加权和迭代加权最小二乘法的相关讨论,其中您在 GLS 中假设的误差之间没有相关性。
If you do not know the covariance between the errors you can take an iterative approach. You would first use Ordinary Least Squares, calculating the errors, and the covariances between the errors. You would then apply the GLS using the calculated covariance matrix and re-estimate the covariance matrix. You would continue iteration using GLS with the new covariance matrix until you have a convergence. Here is a link (.pdf warning) to an example of this method as well as a related discussion of Weighted and Iteratively Weighted Least Squares where you don't have a correlation between the errors as assumed in the GLS.
刚刚遇到 Flanagan 库可以做到这一点盒子的。还收到来自公共用户列表的邮件,目前公共数学 确实不支持FGLS - 协方差矩阵的自动估计
-Bharani
Just came across Flanagan library that does this out of the box. Also got a mail from the commons user list that commons math at the moment does not support FGLS - automatic estimation of covariance matrix
-Bharani
如果您不知道误差之间的协方差,我会使用普通最小二乘法 (OLS) 而不是广义最小二乘法 (GLS)。这相当于将单位矩阵作为协方差矩阵。该库似乎在 < 中实现了 OLS代码>OLSMultipleLinearRegression 。
If you have no idea of the covariance between the errors, I would use Ordinary Least Squares (OLS) instead of Generalized Least Squares (GLS). This amounts to taking the identity matrix as covariance matrix. The library appears to implement OLS in
OLSMultipleLinearRegression
.您是否尝试过直接从数据创建协方差矩阵?
根据评论中的信息,我们知道有 3 个自变量、1 个因变量和 200 个样本。这意味着您将拥有一个 4 列 200 行的数据数组。最终结果将如下所示(明确输入所有内容以尝试解释我的意思):
然后, 当您进行实际回归时,您将得到协方差矩阵:
Have you tried creating a Covariance matrix directly from your data?
Using the information in the comment, we know that there are 3 independent, 1 dependent variables and 200 samples. That implies that you will have a data array with 4 columns and 200 rows. The end result will look something like this (typing everything out explicitly in order to try to explain what I mean):
Then, when you're doing your actual regression, you have your covariance matrix:
@马克拉文
我有点困惑。由于我们只有一个响应变量,因此残差应该是一维变量。那么误差的协方差矩阵适合在哪里呢?
@Mark Lavin
Im a bit confused.. Since we have only one response variable, the residual errors should be 1 dimensional variable. Then where does a covariance matrix of errors fit in?
您需要将 3 个随机独立变量组织为矩阵中的列向量:x1、x2、x3 (N),其中每一行都是一个观测值 (M)。这将是一个 MxN 矩阵。
然后,将此数据矩阵插入 Apache 提供的协方差例程中,例如:
Covariance.computeCovarianceMatrix(RealMatrix 矩阵)。
You need to organize the 3 random independent variates as column vectors in a matrix: x1, x2, x3 (N) where each row is a observation (M). This will be an MxN matrix.
You then plug this data matrix into a covariance routine provided by Apache such as:
Covariance.computeCovarianceMatrix(RealMatrix matrix).