当回归中响应变量的分布非正态时使用 stan 估计参数 - 第 2 部分
这是我之前帖子的扩展 当回归中响应变量的分布非正态时,使用 stan 估计参数。
假设我有以下数据,
dat = list(y = c(0.00792354094929414, 0.00865300734292492, 0.0297400780486734,
0.0196358416326437, 0.00239020640762042, 0.0258055591736283,
0.17394835142698, 0.156463554455613, 0.329388185725557, 0.00764435088817635,
0.0162081480398152, 0, 0.00157591399416963, 0.420025972703085,
0.000122623651944455, 0.133061480234834, 0.565454216154227, 0.000281973481299731,
0.000559715156383041, 0.0270686389659072, 0.918300537689865,
0.00000782624683025728, 0.00732414341919458, 0, 0, 0, 0, 0, 0,
0, 0.174071274611405, 0.0432109713717948, 0.0544400838264943,
0, 0.0907049925221286, 0.616680102647887, 0, 0), x = c(23.8187587698947,
15.9991138359515, 33.6495930512881, 28.555818797764, -52.2967967248258,
-91.3835208788233, -73.9830692708321, -5.16901145289629, 29.8363012310241,
10.6820057903939, 19.4868517164395, 15.4499668436458, -17.0441644773509,
10.7025053739577, -8.6382953428539, -32.8892974839165, -15.8671863161348,
-11.237248036145, -7.37978020066205, -3.33500586334862, -4.02629933182873,
-20.2413384726948, -54.9094885578775, -48.041459120976, -52.3125732905322,
-35.6269065970458, -62.0296155423529, -49.0825017152659, -73.0574478287598,
-50.9409090127938, -63.4650928035253, -55.1263264283842, -52.2841103768755,
-61.2275334149805, -74.2175990067417, -68.2961107804698, -76.6834643609286,
-70.16769103228), N = 38)
我想根据分数响应变量
在上述数据上拟合logit
模型。因此,下面是我的 stan 模型代码,
model = "
data {
int<lower=0> N;
vector[N] x;
vector[N] y;
}
transformed data {
vector[N] z = bernoulli_rng(y);
}
parameters {
real alpha;
real beta;
real<lower=0> sigma;
}
transformed parameters {
vector[N] mu;
mu = alpha + beta * x;
}
model {
sigma ~ normal(0, 1);
alpha ~ normal(0, 1);
beta ~ normal(0, 1);
z ~ bernoulli(mu);
}
"
sampling(stan_model(model_code = model), data = dat, chains = 4, iter = 50000, refresh = 0)
我收到以下错误,
SYNTAX ERROR, MESSAGE(S) FROM PARSER:
Variable definition base type mismatch, variable declared as base type vector variable definition has base type int[ ] error in 'model93e37bdec88_3b62e3bb17b9f3ed9c717c98aa6ca9ac' at line 9, column 32
-------------------------------------------------
7:
8: transformed data {
9: vector[N] z = bernoulli_rng(y);
^
10: }
-------------------------------------------------
Error in h(simpleError(msg, call)) :
error in evaluating the argument 'object' in selecting a method for function 'sampling': failed to parse Stan model '3b62e3bb17b9f3ed9c717c98aa6ca9ac' due to the above error.
您能帮我找到 stan 模型的正确规格吗?
This is an extension of my previous post here Estimating parameters using stan when the distribution for response variable in a regression is non-normal.
Let say I have below data
dat = list(y = c(0.00792354094929414, 0.00865300734292492, 0.0297400780486734,
0.0196358416326437, 0.00239020640762042, 0.0258055591736283,
0.17394835142698, 0.156463554455613, 0.329388185725557, 0.00764435088817635,
0.0162081480398152, 0, 0.00157591399416963, 0.420025972703085,
0.000122623651944455, 0.133061480234834, 0.565454216154227, 0.000281973481299731,
0.000559715156383041, 0.0270686389659072, 0.918300537689865,
0.00000782624683025728, 0.00732414341919458, 0, 0, 0, 0, 0, 0,
0, 0.174071274611405, 0.0432109713717948, 0.0544400838264943,
0, 0.0907049925221286, 0.616680102647887, 0, 0), x = c(23.8187587698947,
15.9991138359515, 33.6495930512881, 28.555818797764, -52.2967967248258,
-91.3835208788233, -73.9830692708321, -5.16901145289629, 29.8363012310241,
10.6820057903939, 19.4868517164395, 15.4499668436458, -17.0441644773509,
10.7025053739577, -8.6382953428539, -32.8892974839165, -15.8671863161348,
-11.237248036145, -7.37978020066205, -3.33500586334862, -4.02629933182873,
-20.2413384726948, -54.9094885578775, -48.041459120976, -52.3125732905322,
-35.6269065970458, -62.0296155423529, -49.0825017152659, -73.0574478287598,
-50.9409090127938, -63.4650928035253, -55.1263264283842, -52.2841103768755,
-61.2275334149805, -74.2175990067417, -68.2961107804698, -76.6834643609286,
-70.16769103228), N = 38)
I want to fit a logit
model on above data based on fractional response variable
. Therefore, below is my stan model code
model = "
data {
int<lower=0> N;
vector[N] x;
vector[N] y;
}
transformed data {
vector[N] z = bernoulli_rng(y);
}
parameters {
real alpha;
real beta;
real<lower=0> sigma;
}
transformed parameters {
vector[N] mu;
mu = alpha + beta * x;
}
model {
sigma ~ normal(0, 1);
alpha ~ normal(0, 1);
beta ~ normal(0, 1);
z ~ bernoulli(mu);
}
"
sampling(stan_model(model_code = model), data = dat, chains = 4, iter = 50000, refresh = 0)
With this I am getting below error
SYNTAX ERROR, MESSAGE(S) FROM PARSER:
Variable definition base type mismatch, variable declared as base type vector variable definition has base type int[ ] error in 'model93e37bdec88_3b62e3bb17b9f3ed9c717c98aa6ca9ac' at line 9, column 32
-------------------------------------------------
7:
8: transformed data {
9: vector[N] z = bernoulli_rng(y);
^
10: }
-------------------------------------------------
Error in h(simpleError(msg, call)) :
error in evaluating the argument 'object' in selecting a method for function 'sampling': failed to parse Stan model '3b62e3bb17b9f3ed9c717c98aa6ca9ac' due to the above error.
Could you please help me to find the correct specification of the stan model?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
可能存在比如何建模饱和概率(恰好为 0 或恰好为 1 的概率)更深层次的问题。
这是您的数据图。从视觉上看,
x
和y
之间没有太多关系。由 reprex 包 (v2.0.1)
而且在 logit 尺度上情况并没有变得更好,即通过转换
z = logit(y)。
由 reprex 包 (v2.0.1)
There might be a deeper issue than how to model saturated probabilities (probabilities that either exactly 0 or exactly 1).
Here is a plot of your data. Visually there isn't much of a relationship between
x
andy
.Created on 2022-03-13 by the reprex package (v2.0.1)
And things don't get better on the logit scale, ie, with the transformation
z = logit(y)
.Created on 2022-03-13 by the reprex package (v2.0.1)