返回介绍

Optimization of standard statistical models

发布于 2025-02-25 23:43:54 字数 9386 浏览 0 评论 0 收藏 0

When we solve standard statistical problems, an optimization procedure similar to the ones discussed here is performed. For example, consider multivariate logistic regression - typically, a Newton-like alogirhtm known as iteratively reweighted least squares (IRLS) is used to find the maximum likelihood estimate for the generalized linear model family. However, using one of the multivariate scalar minimization methods shown above will also work, for example, the BFGS minimization algorithm.

The take home message is that there is nothing magic going on when Python or R fits a statistical model using a formula - all that is happening is that the objective function is set to be the negative of the log likelihood, and the minimum found using some first or second order optimzation algorithm.

import statsmodels.api as sm

Logistic regression as optimization

Suppose we have a binary outcome measure \(Y \in {0,1}\) that is conditinal on some input variable (vector) \(x \in (-\infty, +\infty)\). Let the conditioanl probability be \(p(x) = P(Y=y | X=x)\). Given some data, one simple probability model is \(p(x) = \beta_0 + x\cdot\beta\) - i.e. linear regression. This doesn’t really work for the obvious reason that \(p(x)\) must be between 0 and 1 as \(x\) ranges across the real line. One simple way to fix this is to use the transformation \(g(x) = \frac{p(x)}{1 - p(x)} = \beta_0 + x.\beta\). Solving for \(p\), we get

\[p(x) = \frac{1}{1 + e^{-(\beta_0 + x\cdot\beta)}}\]

As you all know very well, this is logistic regression.

Suppose we have \(n\) data points \((x_i, y_i)\) where \(x_i\) is a vector of features and \(y_i\) is an observed class (0 or 1). For each event, we either have “success” (\(y = 1\)) or “failure” (\(Y = 0\)), so the likelihood looks like the product of Bernoulli random variables. According to the logistic model, the probability of success is \(p(x_i)\) if \(y_i = 1\) and \(1-p(x_i)\) if \(y_i = 0\). So the likelihood is

\[L(\beta_0, \beta) = \prod_{i=1}^n p(x_i)^y(1-p(x_i))^{1-y}\]

and the log-likelihood is

Using the standard ‘trick’, if we augment the matrix \(X\) with a column of 1s, we can write \(\beta_0 + x_i\cdot\beta\) as just \(X\beta\).

df_ = pd.read_csv("http://www.ats.ucla.edu/stat/data/binary.csv")
df_.head()
 admitgregparank
003803.613
116603.673
218004.001
316403.194
405202.934
# We will ignore the rank categorical value

cols_to_keep = ['admit', 'gre', 'gpa']
df = df_[cols_to_keep]
df.insert(1, 'dummy', 1)
df.head()
 admitdummygregpa
0013803.61
1116603.67
2118004.00
3116403.19
4015202.93

Solving as a GLM with IRLS

This is very similar to what you would do in R, only using Python’s statsmodels package. The GLM solver uses a special variant of Newton’s method known as iteratively reweighted least squares (IRLS), which will be further desribed in the lecture on multivarite and constrained optimizaiton.

model = sm.GLM.from_formula('admit ~ gre + gpa',
                            data=df, family=sm.families.Binomial())
fit = model.fit()
fit.summary()
Generalized Linear Model Regression Results
Dep. Variable:admitNo. Observations:400
Model:GLMDf Residuals:397
Model Family:BinomialDf Model:2
Link Function:logitScale:1.0
Method:IRLSLog-Likelihood:-240.17
Date:Wed, 11 Feb 2015Deviance:480.34
Time:17:29:26Pearson chi2:398.
No. Iterations:5  
 coefstd errtP>|t|[95.0% Conf. Int.]
Intercept-4.94941.075-4.6040.000-7.057 -2.842
gre0.00270.0012.5440.0110.001 0.005
gpa0.75470.3202.3610.0180.128 1.381

Solving as logistic model with bfgs

Note that you can choose any of the scipy.optimize algotihms to fit the maximum likelihood model. This knows about higher order derivatives, so will be more accurate than homebrew version.

model2 = sm.Logit.from_formula('admit ~ %s' % '+'.join(df.columns[2:]), data=df)
fit2 = model2.fit(method='bfgs', maxiter=100)
fit2.summary()
Optimization terminated successfully.
         Current function value: 0.600430
         Iterations: 23
         Function evaluations: 65
         Gradient evaluations: 54
Logit Regression Results
Dep. Variable:admitNo. Observations:400
Model:LogitDf Residuals:397
Method:MLEDf Model:2
Date:Wed, 11 Feb 2015Pseudo R-squ.:0.03927
Time:17:31:19Log-Likelihood:-240.17
converged:TrueLL-Null:-249.99
  LLR p-value:5.456e-05
 coefstd errzP>|z|[95.0% Conf. Int.]
Intercept-4.94941.075-4.6040.000-7.057 -2.842
gre0.00270.0012.5440.0110.001 0.005
gpa0.75470.3202.3610.0180.128 1.381

Home-brew logistic regression using a generic minimization function

This is to show that there is no magic going on - you can write the function to minimize directly from the log-likelihood eqaution and run a minimizer. It will be more accurate if you also provide the derivative (+/- the Hessian for seocnd order methods), but using just the function and numerical approximations to the derivative will also work. As usual, this is for illustration so you understand what is going on - when there is a library function available, youu should probably use that instead.

def f(beta, y, x):
    """Minus log likelihood function for logistic regression."""
    return -((-np.log(1 + np.exp(np.dot(x, beta)))).sum() + (y*(np.dot(x, beta))).sum())
beta0 = np.zeros(3)
opt.minimize(f, beta0, args=(df['admit'], df.ix[:, 'dummy':]), method='BFGS', options={'gtol':1e-2})
  status: 0
 success: True
    njev: 16
    nfev: 80
hess_inv: array([[  1.1525e+00,  -2.7800e-04,  -2.8160e-01],
      [ -2.7800e-04,   1.1663e-06,  -1.2190e-04],
      [ -2.8160e-01,  -1.2190e-04,   1.0259e-01]])
     fun: 240.1719908951104
       x: array([ -4.9493e+00,   2.6903e-03,   7.5473e-01])
 message: 'Optimization terminated successfully.'
     jac: array([  9.1553e-05,  -3.2158e-03,   4.5776e-04])

Resources

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。
列表为空,暂无数据
    我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
    原文