避免在R中重复

发布于 2025-02-10 18:38:33 字数 1722 浏览 2 评论 0原文

我试图将各种（截断的）概率分布拟合到相同薄的分位数集中。我可以做到，但似乎需要大量相同代码的重复。有一种更整洁的方式吗？

我正在使用Nadarajah和Kotz的此代码来生成截断分布的PDF：

qtrunc <- function(p, spec, a = -Inf, b = Inf, ...)
{
  tt <- p
  G <- get(paste("p", spec, sep = ""), mode = "function")
  Gin <- get(paste("q", spec, sep = ""), mode = "function")
  tt <- Gin(G(a, ...) + p*(G(b, ...) - G(a, ...)), ...)
  return(tt)
}

其中spec可以是R中的代码存在的任何未截断的分布的名称，而> ... ...参数用于提供该未截断的分布的参数的名称。

为了达到最佳拟合，我需要测量给定分位数与使用分布参数的任意值计算的距离之间的距离。例如，对于伽马发行版，代码如下：

spec <- "gamma"
fit_gamma <- function(x, l = 0,   h = 20, t1 = 5, t2 = 13){
  ct1 <- qtrunc(p = 1/3, spec, a = l, b = h, shape = x[1],rate = x[2])
  ct2 <- qtrunc(p = 2/3, spec, a = l, b = h, shape = x[1],rate = x[2])
  dist <- vector(mode = "numeric", length = 2) 
  dist[1] <- (t1 - ct1)^2
  dist[2] <- (t2- ct2)^2
  return(sqrt(sum(dist)))
}

其中l是较低的截断，h较高，我得到了两个tertiles <代码> T1 和T2。

最后，我使用optim寻求最佳拟合度，因此：

gamma_fit <- optim(par = c(2, 4), 
                fn = fit_gamma, 
                l = l, 
                h = h,
                t1 = t1,
                t2 = t2,
                method = "L-BFGS-B",
                lower = c(1.01, 1.4)

现在假设我想做同样的事情，而是拟合正态分布。我在R中使用的正态分布的参数的名称是均值和sd。

我可以实现我想要的东西，但只有编写一个全新的函数fit_normal，它与我的fit_gamma函数非常相似，但是使用ct1 和ct2。

重复代码的问题变得非常严重，因为我希望尝试将大量不同的分布安装到我的数据中。

我想知道的是，是否有一种编写通用fit_spec的方法，以便我不必写出参数名称。

原文

I am trying to fit a variety of (truncated) probability distributions to the same very thin set of quantiles. I can do it but it seems to require lots of duplication of the same code. Is there a neater way?

I am using this code by Nadarajah and Kotz to generate the pdf of the truncated distributions:

qtrunc <- function(p, spec, a = -Inf, b = Inf, ...)
{
  tt <- p
  G <- get(paste("p", spec, sep = ""), mode = "function")
  Gin <- get(paste("q", spec, sep = ""), mode = "function")
  tt <- Gin(G(a, ...) + p*(G(b, ...) - G(a, ...)), ...)
  return(tt)
}

where spec can be the name of any untruncated distribution for which code in R exists, and the ... argument is used to provide the names of the parameters of that untruncated distribution.

To achieve the best fit I need to measure the distance between the given quantiles and those calculated using arbitrary values of the parameters of the distribution. In the case of the gamma distribution, for example, the code is as follows:

spec <- "gamma"
fit_gamma <- function(x, l = 0,   h = 20, t1 = 5, t2 = 13){
  ct1 <- qtrunc(p = 1/3, spec, a = l, b = h, shape = x[1],rate = x[2])
  ct2 <- qtrunc(p = 2/3, spec, a = l, b = h, shape = x[1],rate = x[2])
  dist <- vector(mode = "numeric", length = 2) 
  dist[1] <- (t1 - ct1)^2
  dist[2] <- (t2- ct2)^2
  return(sqrt(sum(dist)))
}

where l is the lower truncation, h is the higher and I am given the two tertiles t1 and t2.

Finally, I seek the best fit using optim, thus:

gamma_fit <- optim(par = c(2, 4), 
                fn = fit_gamma, 
                l = l, 
                h = h,
                t1 = t1,
                t2 = t2,
                method = "L-BFGS-B",
                lower = c(1.01, 1.4)

Now suppose I want to do the same thing but fitting a normal distribution instead. The names of the parameters of the normal distribution that I am using in R are mean and sd.

I can achieve what I want but only by writing a whole new function fit_normal that is extremely similar to my fit_gamma function but with the new parameter names used in the definition of ct1 and ct2.

The problem of duplication of code becomes very severe because I wish to try fitting a large number of different distributions to my data.

What I want to know is whether there is a way of writing a generic fit_spec as it were so that the parameter names do not have to be written out by me.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

完美的未来在梦里 2025-02-17 18:38:33

使用X作为命名列表，以创建一个参数列表，以传递到qtrunc（）使用do.call（）。

fit_distro <- function(x, spec, l = 0, h = 20, t1 = 5, t2 = 13){
  args <- c(x, list(spec = spec, a = l, b = h))
  
  ct1 <- do.call(qtrunc, args = c(list(p = 1/3), args))
  ct2 <- do.call(qtrunc, args = c(list(p = 2/3), args))
  dist <- vector(mode = "numeric", length = 2) 
  dist[1] <- (t1 - ct1)^2
  dist[2] <- (t2 - ct2)^2
  return(sqrt(sum(dist)))
}

这就是如下所示，与您的原始功能相同。

fit_distro(list(shape = 2, rate = 3), "gamma")
# [1] 13.07425

fit_gamma(c(2, 3))
# [1] 13.07425

这将与其他分布一起使用，无论多么多参数。

fit_distro(list(mean = 10, sd = 3), "norm")
# [1] 4.08379

fit_distro(list(shape1 = 2, shape2 = 3, ncp = 10), "beta")
# [1] 12.98371

Use x as a named list to create a list of arguments to pass into qtrunc() using do.call().

fit_distro <- function(x, spec, l = 0, h = 20, t1 = 5, t2 = 13){
  args <- c(x, list(spec = spec, a = l, b = h))
  
  ct1 <- do.call(qtrunc, args = c(list(p = 1/3), args))
  ct2 <- do.call(qtrunc, args = c(list(p = 2/3), args))
  dist <- vector(mode = "numeric", length = 2) 
  dist[1] <- (t1 - ct1)^2
  dist[2] <- (t2 - ct2)^2
  return(sqrt(sum(dist)))
}

This is called as follows, which is the same as your original function.

fit_distro(list(shape = 2, rate = 3), "gamma")
# [1] 13.07425

fit_gamma(c(2, 3))
# [1] 13.07425

This will work with other distributions, for however many parameters they have.

fit_distro(list(mean = 10, sd = 3), "norm")
# [1] 4.08379

fit_distro(list(shape1 = 2, shape2 = 3, ncp = 10), "beta")
# [1] 12.98371

回复收藏 0 原文

~没有更多了~

关于作者

撑一把青伞

暂无简介

文章

29 人气

关注发私信

友情链接

文江博客

避免在R中重复

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

櫻之舞

弥枳

m2429

寻找一个思念的角度

野却迷人

我怀念的。

友情链接

避免在R中重复

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

櫻之舞

弥枳

m2429

寻找一个思念的角度

野却迷人

我怀念的。

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。