返回介绍

Using statsmodels

发布于 2025-02-25 23:43:40 字数 4795 浏览 0 评论 0 收藏 0

Many of the basic statistical tools available in R are replicted in the statsmodels package. We will only show one example.

# Simulate the genotype for 4 SNPs in a case-control study using an additive genetic model

n = 1000
status = np.random.choice([0,1], n )
genotype = np.random.choice([0,1,2], (n,4))
genotype[status==0] = np.random.choice([0,1,2], (sum(status==0), 4), p=[0.33, 0.33, 0.34])
genotype[status==1] = np.random.choice([0,1,2], (sum(status==1), 4), p=[0.2, 0.3, 0.5])
df = DataFrame(np.hstack([status[:, np.newaxis], genotype]), columns=['status', 'SNP1', 'SNP2', 'SNP3', 'SNP4'])
df.head(6)
 statusSNP1SNP2SNP3SNP4
002120
111022
210121
312212
411201
510012
# Use statsmodels to fit a logistic regression to  the data
fit1 = sm.Logit.from_formula('status ~ %s' % '+'.join(df.columns[1:]), data=df).fit()
fit1.summary()
Optimization terminated successfully.
         Current function value: 0.642824
         Iterations 5
Logit Regression Results
Dep. Variable:statusNo. Observations:1000
Model:LogitDf Residuals:995
Method:MLEDf Model:4
Date:Thu, 22 Jan 2015Pseudo R-squ.:0.07259
Time:15:34:43Log-Likelihood:-642.82
converged:TrueLL-Null:-693.14
  LLR p-value:7.222e-21
 coefstd errzP>|z|[95.0% Conf. Int.]
Intercept-1.74090.203-8.5600.000-2.140 -1.342
SNP10.43060.0835.1730.0000.267 0.594
SNP20.31550.0813.8820.0000.156 0.475
SNP30.22550.0822.7500.0060.065 0.386
SNP40.53410.0836.4040.0000.371 0.698
# Alternative using GLM - similar to R
fit2 = sm.GLM.from_formula('status ~ SNP1 + SNP2 + SNP3 + SNP4', data=df, family=sm.families.Binomial()).fit()
print fit2.summary()
print chisqprob(fit2.null_deviance - fit2.deviance, fit2.df_model)
print(fit2.null_deviance - fit2.deviance, fit2.df_model)
                 Generalized Linear Model Regression Results
==============================================================================
Dep. Variable:                 status   No. Observations:                 1000
Model:                            GLM   Df Residuals:                      995
Model Family:                Binomial   Df Model:                            4
Link Function:                  logit   Scale:                             1.0
Method:                          IRLS   Log-Likelihood:                -642.82
Date:                Thu, 22 Jan 2015   Deviance:                       1285.6
Time:                        15:34:43   Pearson chi2:                 1.01e+03
No. Iterations:                     5
==============================================================================
                 coef    std err          t      P>|t|      [95.0% Conf. Int.]
------------------------------------------------------------------------------
Intercept     -1.7409      0.203     -8.560      0.000        -2.140    -1.342
SNP1           0.4306      0.083      5.173      0.000         0.267     0.594
SNP2           0.3155      0.081      3.882      0.000         0.156     0.475
SNP3           0.2255      0.082      2.750      0.006         0.065     0.386
SNP4           0.5341      0.083      6.404      0.000         0.371     0.698
==============================================================================
7.22229516479e-21
(100.63019840179481, 4)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。
列表为空,暂无数据
    我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
    原文