- Introduction to Python
- Getting started with Python and the IPython notebook
- Functions are first class objects
- Data science is OSEMN
- Working with text
- Preprocessing text data
- Working with structured data
- Using SQLite3
- Using HDF5
- Using numpy
- Using Pandas
- Computational problems in statistics
- Computer numbers and mathematics
- Algorithmic complexity
- Linear Algebra and Linear Systems
- Linear Algebra and Matrix Decompositions
- Change of Basis
- Optimization and Non-linear Methods
- Practical Optimizatio Routines
- Finding roots
- Optimization Primer
- Using scipy.optimize
- Gradient deescent
- Newton’s method and variants
- Constrained optimization
- Curve fitting
- Finding paraemeters for ODE models
- Optimization of graph node placement
- Optimization of standard statistical models
- Fitting ODEs with the Levenberg–Marquardt algorithm
- 1D example
- 2D example
- Algorithms for Optimization and Root Finding for Multivariate Problems
- Expectation Maximizatio (EM) Algorithm
- Monte Carlo Methods
- Resampling methods
- Resampling
- Simulations
- Setting the random seed
- Sampling with and without replacement
- Calculation of Cook’s distance
- Permutation resampling
- Design of simulation experiments
- Example: Simulations to estimate power
- Check with R
- Estimating the CDF
- Estimating the PDF
- Kernel density estimation
- Multivariate kerndel density estimation
- Markov Chain Monte Carlo (MCMC)
- Using PyMC2
- Using PyMC3
- Using PyStan
- C Crash Course
- Code Optimization
- Using C code in Python
- Using functions from various compiled languages in Python
- Julia and Python
- Converting Python Code to C for speed
- Optimization bake-off
- Writing Parallel Code
- Massively parallel programming with GPUs
- Writing CUDA in C
- Distributed computing for Big Data
- Hadoop MapReduce on AWS EMR with mrjob
- Spark on a local mahcine using 4 nodes
- Modules and Packaging
- Tour of the Jupyter (IPython3) notebook
- Polyglot programming
- What you should know and learn more about
- Wrapping R libraries with Rpy
Hierarchical models
Hierarchical models have the following structure - first we specify that the data come from a distribution with parameers \(\theta\)
\[X \sim f(X\ | \ \theta)\]
and that the parameters themselves come from anohter distribution with hyperparameters \(\lambda\)
\[\theta \sim g(\theta \ | \ \lambda)\]
and finally that \(\lambda\) comes from a prior distribution
\[\lambda \sim h(\lambda)\]
More levels of hiearchy are possible - i.e you can specify hyper-hyperparameters for the dsitribution of \(\lambda\) and so on.
The essential idea of the hierarchical model is because the \(\theta\)s are not independent but rather are drawen from a common distribution with parameter \(\lambda\), we can share information across the \(\theta\)s by also estimating \(\lambda\) at the same time.
As an example, suppose have data about the proportion of heads after some number of tosses from several coins, and we want to estimate the bias of each coin. We also know that the coins come from the same mint and so might share soem common manufacturing defect. There are two extreme apporaches - we could estimate the bias of each coin from its coin toss data independently of all the others, or we could pool the results together and estimate the same bias for all coins. Hiearchical models proivde a compromise where we shrink individual estiamtes towards a common estimate.
Note that because of the conditionally indpeendent structure of hiearchical models, Gibbs sampling is often a natural choice for the MCMC sampling strategy.
Suppose we have data of the number of failures (\(y_i\)) for each of 10 pumps in a nuclear plant. We also have the times (\(_i\)) at which each pump was observed. We want to model the number of failures with a Poisson likelihood, where the expected number of failure \(\lambda_i\) differs for each pump. Since the time which we observed each pump is different, we need to scale each \(\lambda_i\) by its observed time \(t_i\).
We now specify the hiearchcical model - note change of notation from the overview above - that \(\theta\) is \(\lambda\) (parameter) and \(\lambda\) is \(\beta\) (hyperparameter) simply because \(\lambda\) is traditional for the Poisson distribution parameter.
The likelihood \(f\) is
\[\prod_{i=1}^{10} \text{Poisson}(\lambda_i t_i)\]
We let the prior \(g\) for \(\lambda\) be
\[\text{Gamma}(\alpha, \beta)\]
with \(\alpha = 1.8\) (an improper prior whose integral does not sum to 1)
and let the hyperprior \(h\) for \(\beta\) to be
\[\text{Gamma}(\gamma, \delta)\]
with \(\gamma = 0.01\) and \(\delta = 1\).
There are 11 unknown parameters (10 \(\lambda\)s and \(\beta\)) in this hierarchical model.
The posterior is
\[p(\lambda, \beta \ | \ y, t) = \prod_{i=1}^{10} \text{Poisson}(\lambda_i t_i) \times \text{Gamma}(\alpha, \beta) \times \text{Gamma}(\gamma, \delta)\]
with the condiitonal distributions needed for Gibbs sampling given by
\[p(\lambda_i \ | \ \lambda_{-i}, \beta, y, t) = \text{Gamma}(y_i + \alpha, t_i + \beta)\]
and
\[p(\beta \ | \ \lambda, y, t) = \text{Gamma}(10\alpha + \gamma, \delta + \sum_{i=1}^10 \lambda_i)\]
from numpy.random import gamma as rgamma # rename so we can use gamma for parameter name
def lambda_update(alpha, beta, y, t): return rgamma(size=len(y), shape=y+alpha, scale=1.0/(t+beta)) def beta_update(alpha, gamma, delta, lambd, y): return rgamma(size=1, shape=len(y) * alpha + gamma, scale=1.0/(delta + lambd.sum())) def gibbs(niter, y, t, alpha, gamma, delta): lambdas_ = np.zeros((niter, len(y)), np.float) betas_ = np.zeros(niter, np.float) lambda_ = y/t for i in range(niter): beta_ = beta_update(alpha, gamma, delta, lambda_, y) lambda_ = lambda_update(alpha, beta_, y, t) betas_[i] = beta_ lambdas_[i,:] = lambda_ return betas_, lambdas_
alpha = 1.8 gamma = 0.01 delta = 1.0 beta0 = 1 y = np.array([5, 1, 5, 14, 3, 19, 1, 1, 4, 22], np.int) t = np.array([94.32, 15.72, 62.88, 125.76, 5.24, 31.44, 1.05, 1.05, 2.10, 10.48], np.float) niter = 1000
betas, lambdas = gibbs(niter, y, t, alpha, gamma, delta) print '%.3f' % betas.mean() print '%.3f' % betas.std(ddof=1) print lambdas.mean(axis=0) print lambdas.std(ddof=1, axis=0)
2.469 0.692 [ 0.0697 0.1557 0.1049 0.1236 0.6155 0.619 0.809 0.8304 1.2989 1.8404] [ 0.027 0.0945 0.0396 0.0305 0.2914 0.1355 0.5152 0.529 0.57 0.391 ]
plt.figure(figsize=(10, 20)) for i in range(len(lambdas.T)): plt.subplot(5,2,i+1) plt.plot(lambdas[::10, i]); plt.title('Trace for $\lambda$%d' % i)
LaTeX for Markov chain diagram
documentclass[10pt]{article} usepackage[usenames]{color} usepackage{amssymb} usepackage{amsmath} usepackage[utf8]{inputenc} usepackage {tikz} usetikzlibrary{automata,arrows,positioning}
begin{tikzpicture}[->,>=stealth’,shorten >=1pt,auto,node distance=2.8cm, semithick] tikzstyle{every state}=[fill=white,draw=black,thick,text=black,scale=1] node[state] (A) {$theta$}; node[state] (B) [right of=A] {$1-theta$}; path (A) edge [bend left] node[above] {$1$} (B); path (B) edge [bend left] node[below] {$frac{theta}{1-theta}$} (A); path (A) edge [loop above] node {0} (A); path (B) edge [loop above] node {$1-frac{theta}{1-theta}$} (B); end{tikzpicture}
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论