GLM Estimation and IRLS

发布于 2025-02-25 23:43:55 字数 3350 浏览 0 评论 0 收藏 0

Recall generalized linear models are models with the following components:

A linear predictor \(\eta = X\beta\)
A response variable with distribution in the exponential family
An invertible ‘link’ function \(g\) such that \[E(Y) = \mu = g^{-1}(\eta)\]

We may write the log-likelihood:

\[\ell(\eta) = \sum\limits_{i=1}^m (y_i \log(\eta_i) + (\eta_i - y_i)\log(1-\eta_i)\]

where \(\eta_i = \eta(x_i,\beta)\).

Differentiating, we obtain:

\[\frac{\partial L}{\partial \beta} = \frac{\partial \eta}{\partial \beta}^T\frac{\partial L}{\partial \eta} = 0\]

Written slightly differently than we have in the previous sections, the Newton update to find \(\beta\) would be:

\[-\frac{\partial^2 L}{\partial \beta \beta^T} \left(\beta_{k+1} -\beta_k\right) = \frac{\partial \eta}{\partial \beta}^T\frac{\partial L}{\partial \eta}\]

Now, if we compute:

\[-\frac{\partial^2 L}{\partial \beta \beta^T} = \sum \frac{\partial L}{\partial \eta_i}\frac{\partial^2 \eta_i}{\partial \beta \beta^T} - \frac{\partial \eta}{\partial \beta}^T \frac{\partial^2 L}{\partial \eta \eta^T} \frac{\partial \eta}{\partial \beta}\]

Taking expected values on the right hand side and noting:

\[E\left(\frac{\partial L}{\partial \eta_i} \right) = 0\]

and

\[E\left(-\frac{\partial^2 L}{\partial \eta \eta^T} \right) = E\left(\frac{\partial L}{\partial \eta}\frac{\partial L}{\partial \eta}^T\right) \equiv A\]

So if we replace the Hessian in Newton’s method with its expected value, we obtain:

\[\frac{\partial \eta}{\partial \beta}^TA\frac{\partial \eta}{\partial \beta}\left(\beta_{k+1} -\beta_k\right) = \frac{\partial \eta}{\partial \beta}^T\frac{\partial L}{\partial \eta}\]

Now, these actually have the form of the normal equations for a weighted least squares problem.

\[\min_{\beta_{k+1}}\left(A^{-1}\frac{\partial L}{\partial \eta} + \frac{\partial \eta}{\partial \beta}\left(\beta_{k+1} -\beta_k\right)\right)^T A \left(A^{-1}\frac{\partial L}{\partial \eta} + \frac{\partial \eta}{\partial \beta}\left(\beta_{k+1} -\beta_k\right)\right)\]

\(A\) is a weight matrix, and changes with iteration - thus this technique is iteratively reweighted least squares.

Constrained Optimization and Lagrange Multipliers

Often, we want to optimize a function subject to a constraint or multiple constraints. The most common analytical technique for this is called ‘Lagrange multipliers’. The theory is based on the following:

If we wish to optimize a function \(f(x,y)\) subject to the constraint \(g(x,y)=c\), we are really looking for points at which the gradient of \(f\) and the gradient of \(g\) are in the same direction. This amounts to:

\[\nabla_{(x,y)}f = \lambda \nabla_{(x,y)}g\]

(often, this is written with a (-) sign in front of \(\lambda\)). The 2-d problem above defines two equations in three unknowns. The original constraint, \(g(xy,)=c\) yields a third equation. Additional constraints are handled by finding:

\[\nabla_{(x,y)}f = \lambda_1 \nabla_{(x,y)}g_1 + ... + \lambda_k \nabla_{(x,y)}g_k\]

The generalization to functions on \(\mathbb{R}^n\) is also trivial:

\[\nabla_{x}f = \lambda \nabla_{x}g\]

分享到QQ

分享到微博