使用bam（）with family = betar使用bam（）的游戏错误

发布于 2025-02-07 18:51:00 字数 4019 浏览 4 评论 0原文

我在求解bam（）从MGCV运行bam（）时遇到的错误遇到了困难。

我注意到，报告了类似的错误在这里 14个月前，似乎没有达成共识 - 建议给西蒙·伍德发送电子邮件。

我的数据是在这里。数据集太大，无法粘贴dput（）的输出。

但是，如果我使用整个数据集运行以下模型，则会得到以下错误

library(mgcv)

m3 <- bam(pt10 ~ 
            org.type +
            region +
            s(year) + 
            s(year, by = org.type) +
            s(year, by = region), 
          data = error, 
          method = "fREML", 
          family = betar(link="logit", eps = 0.1),
          select = T)

Warning messages:
1: In estimate.theta(theta, family, G$y, linkinv(eta), scale = scale1,  :
  step failure in theta estimation
2: In wt * LS :
  longer object length is not a multiple of shorter object length
3: In muth * (log(y) - log1p(-y)) :
  longer object length is not a multiple of shorter object length
4: In -lgamma(theta) + lgamma(muth) + lgamma(theta - muth) - muth *  :
  longer object length is not a multiple of shorter object length
5: In -lgamma(theta) + lgamma(muth) + lgamma(theta - muth) - muth *  :
  longer object length is not a multiple of shorter object length
6: In -lgamma(theta) + lgamma(muth) + lgamma(theta - muth) - muth *  :
  longer object length is not a multiple of shorter object length
7: In prior. weights * y :
  longer object length is not a multiple of shorter object length
8: In 2 * wt * (-lgamma(theta) + lgamma(muth) + lgamma(theta - muth) -  :
  longer object length is not a multiple of shorter object length

，但是，如果我使用整个数据集运行相同的模型，但是我排除了最后一行，则该模型似乎可以运行

m3 <- bam(pt10 ~ 
            org.type +
            region +
            s(year) + 
            s(year, by = org.type) +
            s(year, by = region), 
          data = error[1:20500,], 
          method = "fREML", 
          family = betar(link="logit", eps = 0.1),
          select = T)

OK向我建议，数据集的最后一行有问题。但是，我在数据集的最后一行中看不到任何错误，我期望会产生上述警告消息。

如果我再次在数据的一小部分上运行相同的模型，但是这次包括最后一行数据，则该模型似乎可以运行正常。

m3 <- bam(pt10 ~ 
            org.type +
            region +
            s(year) + 
            s(year, by = org.type) +
            s(year, by = region), 
          data = error[20400:20501,], 
          method = "fREML", 
          family = betar(link="logit", eps = 0.1),
          select = T)

但是，更大的数据子集（包括最后一行）再次产生与上面的类似警告消息。

m3 <- bam(pt10 ~ 
            org.type +
            region +
            s(year) + 
            s(year, by = org.type) +
            s(year, by = region), 
          data = error[10000:20501,], 
          method = "fREML", 
          family = betar(link="logit", eps = 0.1),
          select = T)

Warning messages:
1: In wt * LS :
  longer object length is not a multiple of shorter object length
2: In muth * (log(y) - log1p(-y)) :
  longer object length is not a multiple of shorter object length
3: In -lgamma(theta) + lgamma(muth) + lgamma(theta - muth) - muth *  :
  longer object length is not a multiple of shorter object length
4: In -lgamma(theta) + lgamma(muth) + lgamma(theta - muth) - muth *  :
  longer object length is not a multiple of shorter object length
5: In -lgamma(theta) + lgamma(muth) + lgamma(theta - muth) - muth *  :
  longer object length is not a multiple of shorter object length
6: In prior.weights * y :
  longer object length is not a multiple of shorter object length
7: In 2 * wt * (-lgamma(theta) + lgamma(muth) + lgamma(theta - muth) -  :
  longer object length is not a multiple of shorter object length
8: In bgam.fit(G, mf, chunk.size, gp, scale, gamma, method = method,  :
  algorithm did not converge

任何建议都感谢。

原文

I'm having trouble solving an error I am getting when running bam() from mgcv.

I note that a similar error was reported here 14 months ago and there seemed to be no agreed on solution - with the suggestion being to email Simon Wood.

My data are here. The data set is too big to paste the output of dput().

If I run the below model using the entire data set I get the below errors

library(mgcv)

m3 <- bam(pt10 ~ 
            org.type +
            region +
            s(year) + 
            s(year, by = org.type) +
            s(year, by = region), 
          data = error, 
          method = "fREML", 
          family = betar(link="logit", eps = 0.1),
          select = T)

Warning messages:
1: In estimate.theta(theta, family, G$y, linkinv(eta), scale = scale1,  :
  step failure in theta estimation
2: In wt * LS :
  longer object length is not a multiple of shorter object length
3: In muth * (log(y) - log1p(-y)) :
  longer object length is not a multiple of shorter object length
4: In -lgamma(theta) + lgamma(muth) + lgamma(theta - muth) - muth *  :
  longer object length is not a multiple of shorter object length
5: In -lgamma(theta) + lgamma(muth) + lgamma(theta - muth) - muth *  :
  longer object length is not a multiple of shorter object length
6: In -lgamma(theta) + lgamma(muth) + lgamma(theta - muth) - muth *  :
  longer object length is not a multiple of shorter object length
7: In prior. weights * y :
  longer object length is not a multiple of shorter object length
8: In 2 * wt * (-lgamma(theta) + lgamma(muth) + lgamma(theta - muth) -  :
  longer object length is not a multiple of shorter object length

However, if I run the same model using the entire dataset, but I exclude the last row, the model appears to run ok

m3 <- bam(pt10 ~ 
            org.type +
            region +
            s(year) + 
            s(year, by = org.type) +
            s(year, by = region), 
          data = error[1:20500,], 
          method = "fREML", 
          family = betar(link="logit", eps = 0.1),
          select = T)

This suggested to me that there was something wrong in the last row of the data set. However, I cannot see any errors in the last row of the data set that I would expect to produce the above warning messages.

If I again run the same model on a small subset of the data, but this time include the last row of data, the model again appears to run ok.

m3 <- bam(pt10 ~ 
            org.type +
            region +
            s(year) + 
            s(year, by = org.type) +
            s(year, by = region), 
          data = error[20400:20501,], 
          method = "fREML", 
          family = betar(link="logit", eps = 0.1),
          select = T)

But a larger subset of the data, again including the last row, produces similar warning messages to above.

m3 <- bam(pt10 ~ 
            org.type +
            region +
            s(year) + 
            s(year, by = org.type) +
            s(year, by = region), 
          data = error[10000:20501,], 
          method = "fREML", 
          family = betar(link="logit", eps = 0.1),
          select = T)

Warning messages:
1: In wt * LS :
  longer object length is not a multiple of shorter object length
2: In muth * (log(y) - log1p(-y)) :
  longer object length is not a multiple of shorter object length
3: In -lgamma(theta) + lgamma(muth) + lgamma(theta - muth) - muth *  :
  longer object length is not a multiple of shorter object length
4: In -lgamma(theta) + lgamma(muth) + lgamma(theta - muth) - muth *  :
  longer object length is not a multiple of shorter object length
5: In -lgamma(theta) + lgamma(muth) + lgamma(theta - muth) - muth *  :
  longer object length is not a multiple of shorter object length
6: In prior.weights * y :
  longer object length is not a multiple of shorter object length
7: In 2 * wt * (-lgamma(theta) + lgamma(muth) + lgamma(theta - muth) -  :
  longer object length is not a multiple of shorter object length
8: In bgam.fit(G, mf, chunk.size, gp, scale, gamma, method = method,  :
  algorithm did not converge

Any advice appreciated.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

静谧 2025-02-14 18:51:00

我怀疑问题是您的EPS（这可能确实表明您对数据有问题）。

默认值为：

rgt; .Machine$double.eps*100                                                     
[1] 2.220446e-14

因此，您将所有响应值截断为Interval [EPS，1-EPS]（即任何内容y＆lt; eps 或y＆gt; 1 -eps正在重置为EPS和1- EPS aidgeifice。）。我想这会导致拟合算法问题，并且正在遇到未预料到的情况。如果范围[EPS，1-eps] 的范围不超过数量的值数量并不重要，那么您将在该范围的范围内堆积所有这些值，我怀疑这会导致数据细微变化的情况会导致拟合算法中的数值问题。

与您一样，将数据截断表明这不是适合您的数据的正确分布。如果没有其他任何信息，我会在其他地方寻找一种更合适的方法。

I suspect the problem is with your eps (which probably does indicate that you have issues with the data).

The default is:

rgt; .Machine$double.eps*100                                                     
[1] 2.220446e-14

so you are truncating all your response values to the interval [eps, 1-eps] (i.e. anything y < eps or y > 1-eps is being reset to eps and 1 - eps respectaively.). I suppose that is causing problems with the fitting algorithm and that it is encountering situations that were not anticipated. If there are a not insignificant number of values that are outside the range [eps, 1-eps], you will be piling all those values up on the limits of the range and I suspect that is leading to situations where subtle changes in the data are leading to numerical problems in the fitting algorithm.

Truncating the data as much as you are doing suggests this is not the right distribution for your data. Absent any other information I'd look elsewhere for a more suitable method.

回复收藏 0 原文

~没有更多了~