numpy:用更多观察值更新最小二乘法的代码
我正在寻找基于 numpy 的普通最小二乘实现,它允许通过更多观察来更新拟合。类似于应用统计算法 AS 274 或 R 的biglm
。
如果做不到这一点,用新行更新 QR 分解的例程也会令人感兴趣。
有什么指点吗?
I am looking for a numpy
-based implementation of ordinary least squares that would allow the fit to be updated with more observations. Something along the lines of Applied Statistics algorithm AS 274 or R's biglm
.
Failing that, a routine for updating a QR decomposition with new rows would also be of interest.
Any pointers?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
scikits.statsmodels 有一个递归 OLS,可以更新沙箱中可用于此目的的逆 X'X。 (仅用于计算递归 OLS 残差。)
当数据太大而无法装入内存时,Nathaniel Smith 将他的 OLS 代码发布到 scipy 用户邮件列表中。主要代码更新X'X。
我认为econpy也有这个功能。
Pandas 有一个扩展的 OLS,但它可能不太容易以在线方式使用。
Nathaniels 的代码可能是最接近 biglm 的。我认为一般线性模型没有任何意义(误差协方差与恒等式不同)。
所有这些都需要一些工作才能用于此目的。我不知道有任何 python(-wrapped) 代码可以更新 QR。
更新:
请参阅http://mail.scipy.org/pipermail/scipy- dev/2010-February/013853.html
cholmod 中有增量 qr 和 cholesky 可用,但我没有尝试,无论是许可证还是关于Windows问题的编译,我不认为我试图让incremental_qr工作
请参阅附件
http://mail.scipy.org/pipermail/scipy -dev/2010-February/013844.html
scikits.statsmodels has an recursive OLS that updates the inverse X'X in the sandbox that could be used for this. (used only to calculate recursive OLS residuals.)
Nathaniel Smith posted his code for OLS when the data is too large to fit in memory to the scipy-user mailing list. The main code updates X'X.
I think econpy also has a function for this.
Pandas has an expanding OLS, but it may not be easy to use in an online fashion.
Nathaniels code might be the closest to biglm. I don't think there is anything for general linear model (error covariance different from identity).
All need some work before they can be used for this. I don't know of any python(-wrapped) code that would update QR.
update:
see http://mail.scipy.org/pipermail/scipy-dev/2010-February/013853.html
there is incremental qr and cholesky in cholmod available, but I didn't try it, either license or compilation on windows problems, and I don't think I tried to get incremental_qr to work
see attachements
http://mail.scipy.org/pipermail/scipy-dev/2010-February/013844.html
您可以尝试位于 http://code.google.com/p/pythonequations/ 的 pythonequations 项目downloads/list,尽管它可能超出您的需要,但它确实使用 scipy 和 numpy。该代码是 http://zunzun.com 在线曲线和曲面拟合网站的中间件(我是作者)。源代码带有许多示例。或者,单独的网站可能就足够了 - 请尝试一下。
You might try the pythonequations project at http://code.google.com/p/pythonequations/downloads/list, though it may be more than you need it does use scipy and numpy. That code is the middleware for the http://zunzun.com online curve and surface fitting web site (I'm the author). The source code comes with many examples. Alternatively, the web site alone may be sufficient - please give it a try.
这还不是详细的答案,但是:
AFAIK,
QR
更新如 此未在numpy
中实现,但无论如何我想请您指定您实际目标的更详细方式。特别是,为什么仅使用
k
最新观测值计算x
(Ax= b
)的新估计是不可接受的,当(bunch的)新的观察结果到来(并且使用现代硬件,k
确实可以是相当大的一个)?This is not a detailed answer yet, but:
AFAIK, the
QR
update like this is not implemented innumpy
, but anyway I'll like ask you to specify a more detailed manner what you are actually aiming for.Especially, why it would not be acceptable to just calculate new estimate for
x
(ofAx= b
) withk
latest observations, when (bunch of) new observations arrives (and with modern hardware,k
indeed can be quite large one)?文件的 LSQ.F90 部分很容易编译,
并且这在 Python 中有效,
一旦我弄清楚函数调用,我就会将其包含在这个答案中。
The
LSQ.F90
part of the file compiles easily enough with,and this works in Python,
As soon as I figure out the function call I'll include it in this answer.