Using statsmodels

发布于 2025-02-25 23:43:40 字数 4795 浏览 0 评论 0 收藏 0

Many of the basic statistical tools available in R are replicted in the statsmodels package. We will only show one example.

# Simulate the genotype for 4 SNPs in a case-control study using an additive genetic model

n = 1000
status = np.random.choice([0,1], n )
genotype = np.random.choice([0,1,2], (n,4))
genotype[status==0] = np.random.choice([0,1,2], (sum(status==0), 4), p=[0.33, 0.33, 0.34])
genotype[status==1] = np.random.choice([0,1,2], (sum(status==1), 4), p=[0.2, 0.3, 0.5])
df = DataFrame(np.hstack([status[:, np.newaxis], genotype]), columns=['status', 'SNP1', 'SNP2', 'SNP3', 'SNP4'])
df.head(6)

	status	SNP1	SNP2	SNP3	SNP4
0	0	2	1	2	0
1	1	1	0	2	2
2	1	0	1	2	1
3	1	2	2	1	2
4	1	1	2	0	1
5	1	0	0	1	2

# Use statsmodels to fit a logistic regression to  the data
fit1 = sm.Logit.from_formula('status ~ %s' % '+'.join(df.columns[1:]), data=df).fit()
fit1.summary()

Optimization terminated successfully.
         Current function value: 0.642824
         Iterations 5

Logit Regression Results
Dep. Variable:	status	No. Observations:	1000
Model:	Logit	Df Residuals:	995
Method:	MLE	Df Model:	4
Date:	Thu, 22 Jan 2015	Pseudo R-squ.:	0.07259
Time:	15:34:43	Log-Likelihood:	-642.82
converged:	True	LL-Null:	-693.14
		LLR p-value:	7.222e-21

	coef	std err	z	P>\|z\|	[95.0% Conf. Int.]
Intercept	-1.7409	0.203	-8.560	0.000	-2.140 -1.342
SNP1	0.4306	0.083	5.173	0.000	0.267 0.594
SNP2	0.3155	0.081	3.882	0.000	0.156 0.475
SNP3	0.2255	0.082	2.750	0.006	0.065 0.386
SNP4	0.5341	0.083	6.404	0.000	0.371 0.698

# Alternative using GLM - similar to R
fit2 = sm.GLM.from_formula('status ~ SNP1 + SNP2 + SNP3 + SNP4', data=df, family=sm.families.Binomial()).fit()
print fit2.summary()
print chisqprob(fit2.null_deviance - fit2.deviance, fit2.df_model)
print(fit2.null_deviance - fit2.deviance, fit2.df_model)

                 Generalized Linear Model Regression Results
==============================================================================
Dep. Variable:                 status   No. Observations:                 1000
Model:                            GLM   Df Residuals:                      995
Model Family:                Binomial   Df Model:                            4
Link Function:                  logit   Scale:                             1.0
Method:                          IRLS   Log-Likelihood:                -642.82
Date:                Thu, 22 Jan 2015   Deviance:                       1285.6
Time:                        15:34:43   Pearson chi2:                 1.01e+03
No. Iterations:                     5
==============================================================================
                 coef    std err          t      P>|t|      [95.0% Conf. Int.]
------------------------------------------------------------------------------
Intercept     -1.7409      0.203     -8.560      0.000        -2.140    -1.342
SNP1           0.4306      0.083      5.173      0.000         0.267     0.594
SNP2           0.3155      0.081      3.882      0.000         0.156     0.475
SNP3           0.2255      0.082      2.750      0.006         0.065     0.386
SNP4           0.5341      0.083      6.404      0.000         0.371     0.698
==============================================================================
7.22229516479e-21
(100.63019840179481, 4)

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

列表为空，暂无数据

Using statsmodels

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。