返回介绍

References

发布于 2025-02-25 23:43:58 字数 7846 浏览 0 评论 0 收藏 0

Coin toss

We’ll repeat the example of determining the bias of a coin from observed coin tosses. The likelihood is binomial, and we use a beta prior.

coin_code = """
data {
    int<lower=0> n; // number of tosses
    int<lower=0> y; // number of heads
}
transformed data {}
parameters {
    real<lower=0, upper=1> p;
}
transformed parameters {}
model {
    p ~ beta(2, 2);
    y ~ binomial(n, p);
}
generated quantities {}
"""

coin_dat = {
             'n': 100,
             'y': 61,
            }

fit = pystan.stan(model_code=coin_code, data=coin_dat, iter=1000, chains=1)

Loading from a file

The string in coin_code can also be in a file - say coin_code.stan - then we can use it like so

fit = pystan.stan(file='coin_code.stan', data=coin_dat, iter=1000, chains=1)
print(fit)
Inference for Stan model: anon_model_7f1947cd2d39ae427cd7b6bb6e6ffd77.
1 chains, each with iter=1000; warmup=500; thin=1;
post-warmup draws per chain=500, total post-warmup draws=500.

       mean se_mean     sd   2.5%    25%    50%    75%  97.5%  n_eff   Rhat
p      0.61  4.9e-3   0.05   0.51   0.57   0.61   0.64   0.69   91.0    1.0
lp__ -70.22    0.06   0.66 -71.79 -70.43 -69.97 -69.79 -69.74  134.0    1.0

Samples were drawn using NUTS(diag_e) at Wed Mar 18 08:54:14 2015.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at
convergence, Rhat=1).
coin_dict = fit.extract()
coin_dict.keys()
# lp_ is the log posterior
[u'mu', u'sigma', u'lp__']
fit.plot('p');
plt.tight_layout()

Estimating mean and standard deviation of normal distribution

\[X \sim \mathcal{N}(\mu, \sigma^2)\]

norm_code = """
data {
    int<lower=0> n;
    real y[n];
}
transformed data {}
parameters {
    real<lower=0, upper=100> mu;
    real<lower=0, upper=10> sigma;
}
transformed parameters {}
model {
    y ~ normal(mu, sigma);
}
generated quantities {}
"""

norm_dat = {
             'n': 100,
             'y': np.random.normal(10, 2, 100),
            }

fit = pystan.stan(model_code=norm_code, data=norm_dat, iter=1000, chains=1)
print fit
Inference for Stan model: anon_model_3318343d5265d1b4ebc1e443f0228954.
1 chains, each with iter=1000; warmup=500; thin=1;
post-warmup draws per chain=500, total post-warmup draws=500.

        mean se_mean     sd   2.5%    25%    50%    75%  97.5%  n_eff   Rhat
mu     10.09    0.02   0.19   9.72   9.97  10.09  10.22  10.49  120.0    1.0
sigma   2.02    0.01   0.15   1.74   1.92   2.01   2.12   2.32  119.0   1.01
lp__  -117.2    0.11   1.08 -120.0 -117.5 -116.8 -116.4 -116.2  105.0    1.0

Samples were drawn using NUTS(diag_e) at Wed Mar 18 08:54:50 2015.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at
convergence, Rhat=1).
trace = fit.extract()
plt.figure(figsize=(10,4))
plt.subplot(1,2,1);
plt.hist(trace['mu'][:], 25, histtype='step');
plt.subplot(1,2,2);
plt.hist(trace['sigma'][:], 25, histtype='step');

Optimization (finding MAP)

sm = pystan.StanModel(model_code=norm_code)
op = sm.optimizing(data=norm_dat)
op
OrderedDict([(u'mu', array(10.3016473417206)), (u'sigma', array(1.8823589782831152))])

Reusing fitted objects

new_dat = {
             'n': 100,
             'y': np.random.normal(10, 2, 100),
            }
fit2 = pystan.stan(fit=fit, data=new_dat, chains=1)
print fit2
Inference for Stan model: anon_model_3318343d5265d1b4ebc1e443f0228954.
1 chains, each with iter=2000; warmup=1000; thin=1;
post-warmup draws per chain=1000, total post-warmup draws=1000.

        mean se_mean     sd   2.5%    25%    50%    75%  97.5%  n_eff   Rhat
mu      9.89    0.01   0.19   9.54   9.76    9.9  10.02  10.27  250.0    1.0
sigma   1.99  9.3e-3   0.15   1.72   1.89   1.98   2.07   2.33  250.0    1.0
lp__  -115.4    0.08   1.01 -118.1 -115.8 -115.1 -114.7 -114.5  153.0    1.0

Samples were drawn using NUTS(diag_e) at Wed Mar 18 08:58:32 2015.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at
convergence, Rhat=1).

Saving compiled models

We can also compile Stan models and save them to file, so as to reload them for later use without needing to recompile.

def save(obj, filename):
    """Save compiled models for reuse."""
    import pickle
    with open(filename, 'w') as f:
        pickle.dump(obj, f, protocol=pickle.HIGHEST_PROTOCOL)

def load(filename):
    """Reload compiled models for reuse."""
    import pickle
    return pickle.load(open(filename, 'r'))
model = pystan.StanModel(model_code=norm_code)
save(model, 'norm_model.pic')
new_model = load('norm_model.pic')
fit4 = new_model.sampling(new_dat, chains=1)
print fit4
Inference for Stan model: anon_model_3318343d5265d1b4ebc1e443f0228954.
1 chains, each with iter=2000; warmup=1000; thin=1;
post-warmup draws per chain=1000, total post-warmup draws=1000.

        mean se_mean     sd   2.5%    25%    50%    75%  97.5%  n_eff   Rhat
mu      9.91    0.01    0.2    9.5   9.78   9.91  10.05   10.3  283.0    1.0
sigma    2.0  9.3e-3   0.15   1.73    1.9   1.99   2.09   2.31  244.0    1.0
lp__  -115.5    0.08   1.03 -118.2 -115.8 -115.1 -114.8 -114.5  153.0   1.01

Samples were drawn using NUTS(diag_e) at Wed Mar 18 09:18:30 2015.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at
convergence, Rhat=1).

Estimating parameters of a linear regreession model

We will show how to estimate regression parameters using a simple linear modesl

\[y \sim ax + b\]

We can restate the linear model

\[y = ax + b + \epsilon\]

as sampling from a probability distribution

\[y \sim \mathcal{N}(ax + b, \sigma^2)\]

We will assume the following priors

\[\begin{split}a \sim \mathcal{N}(0, 100) \\ b \sim \mathcal{N}(0, 100) \\ \sigma \sim \mathcal{U}(0, 20)\end{split}\]

lin_reg_code = """
data {
    int<lower=0> n;
    real x[n];
    real y[n];
}
transformed data {}
parameters {
    real a;
    real b;
    real sigma;
}
transformed parameters {
    real mu[n];
    for (i in 1:n) {
        mu[i] <- a*x[i] + b;
        }
}
model {
    sigma ~ uniform(0, 20);
    y ~ normal(mu, sigma);
}
generated quantities {}
"""

n = 11
_a = 6
_b = 2
x = np.linspace(0, 1, n)
y = _a*x + _b + np.random.randn(n)

lin_reg_dat = {
             'n': n,
             'x': x,
             'y': y
            }

fit = pystan.stan(model_code=lin_reg_code, data=lin_reg_dat, iter=1000, chains=1)
print fit
fit.plot(['a', 'b']);
plt.tight_layout()

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。
列表为空,暂无数据
    我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
    原文