Hierarchical models

发布于 2025-02-25 23:43:57 字数 5618 浏览 0 评论 0 收藏 0

Hierarchical models have the following structure - first we specify that the data come from a distribution with parameers $\theta$

\[X \sim f(X\ | \ \theta)\]

and that the parameters themselves come from anohter distribution with hyperparameters $\lambda$

\[\theta \sim g(\theta \ | \ \lambda)\]

and finally that $\lambda$ comes from a prior distribution

\[\lambda \sim h(\lambda)\]

More levels of hiearchy are possible - i.e you can specify hyper-hyperparameters for the dsitribution of $\lambda$ and so on.

The essential idea of the hierarchical model is because the $\theta$s are not independent but rather are drawen from a common distribution with parameter $\lambda$, we can share information across the $\theta$s by also estimating $\lambda$ at the same time.

As an example, suppose have data about the proportion of heads after some number of tosses from several coins, and we want to estimate the bias of each coin. We also know that the coins come from the same mint and so might share soem common manufacturing defect. There are two extreme apporaches - we could estimate the bias of each coin from its coin toss data independently of all the others, or we could pool the results together and estimate the same bias for all coins. Hiearchical models proivde a compromise where we shrink individual estiamtes towards a common estimate.

Note that because of the conditionally indpeendent structure of hiearchical models, Gibbs sampling is often a natural choice for the MCMC sampling strategy.

Suppose we have data of the number of failures ($y_i$) for each of 10 pumps in a nuclear plant. We also have the times ($_i$) at which each pump was observed. We want to model the number of failures with a Poisson likelihood, where the expected number of failure $\lambda_i$ differs for each pump. Since the time which we observed each pump is different, we need to scale each $\lambda_i$ by its observed time $t_i$.

We now specify the hiearchcical model - note change of notation from the overview above - that $\theta$ is $\lambda$ (parameter) and $\lambda$ is $\beta$ (hyperparameter) simply because $\lambda$ is traditional for the Poisson distribution parameter.

The likelihood $f$ is

\[\prod_{i=1}^{10} \text{Poisson}(\lambda_i t_i)\]

We let the prior $g$ for $\lambda$ be

\[\text{Gamma}(\alpha, \beta)\]

with $\alpha = 1.8$ (an improper prior whose integral does not sum to 1)

and let the hyperprior $h$ for $\beta$ to be

\[\text{Gamma}(\gamma, \delta)\]

with $\gamma = 0.01$ and $\delta = 1$.

There are 11 unknown parameters (10 $\lambda$s and $\beta$) in this hierarchical model.

The posterior is

\[p(\lambda, \beta \ | \ y, t) = \prod_{i=1}^{10} \text{Poisson}(\lambda_i t_i) \times \text{Gamma}(\alpha, \beta) \times \text{Gamma}(\gamma, \delta)\]

with the condiitonal distributions needed for Gibbs sampling given by

\[p(\lambda_i \ | \ \lambda_{-i}, \beta, y, t) = \text{Gamma}(y_i + \alpha, t_i + \beta)\]

and

\[p(\beta \ | \ \lambda, y, t) = \text{Gamma}(10\alpha + \gamma, \delta + \sum_{i=1}^10 \lambda_i)\]

from numpy.random import gamma as rgamma # rename so we can use gamma for parameter name

def lambda_update(alpha, beta, y, t):
    return rgamma(size=len(y), shape=y+alpha, scale=1.0/(t+beta))

def beta_update(alpha, gamma, delta, lambd, y):
    return rgamma(size=1, shape=len(y) * alpha + gamma, scale=1.0/(delta + lambd.sum()))

def gibbs(niter, y, t, alpha, gamma, delta):
    lambdas_ = np.zeros((niter, len(y)), np.float)
    betas_ = np.zeros(niter, np.float)

    lambda_ = y/t

    for i in range(niter):
        beta_ = beta_update(alpha, gamma, delta, lambda_, y)
        lambda_ = lambda_update(alpha, beta_, y, t)

        betas_[i] = beta_
        lambdas_[i,:] = lambda_

    return betas_, lambdas_

alpha = 1.8
gamma = 0.01
delta = 1.0
beta0 = 1
y = np.array([5, 1, 5, 14, 3, 19, 1, 1, 4, 22], np.int)
t = np.array([94.32, 15.72, 62.88, 125.76, 5.24, 31.44, 1.05, 1.05, 2.10, 10.48], np.float)
niter = 1000

betas, lambdas = gibbs(niter, y, t, alpha, gamma, delta)
print '%.3f' % betas.mean()
print '%.3f' % betas.std(ddof=1)
print lambdas.mean(axis=0)
print lambdas.std(ddof=1, axis=0)

2.469
0.692
[ 0.0697  0.1557  0.1049  0.1236  0.6155  0.619   0.809   0.8304  1.2989
  1.8404]
[ 0.027   0.0945  0.0396  0.0305  0.2914  0.1355  0.5152  0.529   0.57
  0.391 ]

plt.figure(figsize=(10, 20))
for i in range(len(lambdas.T)):
    plt.subplot(5,2,i+1)
    plt.plot(lambdas[::10, i]);
    plt.title('Trace for $\lambda$%d' % i)

LaTeX for Markov chain diagram

documentclass[10pt]{article} usepackage[usenames]{color} usepackage{amssymb} usepackage{amsmath} usepackage[utf8]{inputenc} usepackage {tikz} usetikzlibrary{automata,arrows,positioning}

begin{tikzpicture}[->,>=stealth’,shorten >=1pt,auto,node distance=2.8cm, semithick] tikzstyle{every state}=[fill=white,draw=black,thick,text=black,scale=1] node[state] (A) {$theta$}; node[state] (B) [right of=A] {$1-theta$}; path (A) edge [bend left] node[above] {$1$} (B); path (B) edge [bend left] node[below] {$frac{theta}{1-theta}$} (A); path (A) edge [loop above] node {0} (A); path (B) edge [loop above] node {$1-frac{theta}{1-theta}$} (B); end{tikzpicture}

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

列表为空，暂无数据

Hierarchical models

LaTeX for Markov chain diagram

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。