如何解决 R 中 for 循环的问题

发布于 2025-01-13 18:21:31 字数 2625 浏览 2 评论 0原文

我正在为想要根据 AADTMAJ、L 和 Base_Past 预测值的人们编写一个包。该函数提供两个选项:1)允许用户输入自己的回归系数,或2)为用户提供预定义的系数。但是,我无法正确使用 return() 。

输入数据

data=data.frame(Base_Past=c("HSM-RUR2U-KABCO",
                            "HSM-RUR2U-KABCO",
                            "HSM-RUR4-KABC",
                            "HSM-RUR4-KABCO"),
                AADTMAJ=c(100,100,100,100),
                L=c(1,1,1,1)
)

输入自定义回归系数

custom.spf=data.frame(Base_Past=c("HSM-RUR2U-KABCO","HSM-RUR2U-KABC"), a=c(-0.312,-0.19))

定义辅助函数

helper_function = function (data, Base_Past=FALSE, override=custom.spf){
  if (is.data.frame(override)){
    for (j in 1:nrow(override)){
      for (i in 1:nrow(data)){
        if(data[i, ]$Base_Past==override[j, ]$Base_Past){
          output=as.numeric(data[i, ]$AADTMAJ*data[i, ]$L*365*10^(-6)*exp(override[j, ]$a))
          return(output)} else{
            if(data[i, ]$Base_Past=="HSM-RUR4-KABCO") {a=-0.101}
            if(data[i, ]$Base_Past=="HSM-RUR4-KABC") {a=-0.143}
            output=as.numeric(data[i, ]$AADTMAJ*data[i, ]$L*365*10^(-6)*exp(a))
            return(output)
          } 
      }
    }
  }
  
  else if (!is.data.frame(override)){
    if(Base_Past=="HSM-RUR4-KABCO") {a=-0.101}
    if(Base_Past=="HSM-RUR4-KABC") {a=-0.143}
    output=as.numeric(data[i, ]$AADTMAJ*data[i, ]$L*365*10^(-6)*exp(a))
    return(output)
  }
}

运行

(data %>% dplyr::rowwise() %>% dplyr::mutate(predicted_value = helper_function(data = data, override=custom.spf)))[,4]



输出

# A tibble: 4 x 1
# Rowwise: 
  predicted_value
            <dbl>
1          0.0267
2          0.0267
3          0.0267
4          0.0267

替代

data %>% dplyr::mutate(predicted_value=dplyr::case_when(Base_Past =="HSM-RUR4-KABCO" ~AADTMAJ*L*365*10^(-6)*exp(-0.101),
                                                        Base_Past=="HSM-RUR4-KABC" ~AADTMAJ*L*365*10^(-6)*exp(-0.143),
                                                        Base_Past=="HSM-RUR2U-KABCO" ~AADTMAJ*L*365*10^(-6)*exp(-0.312),
                                                        Base_Past=="HSM-RUR2U-KABC" ~AADTMAJ*L*365*10^(-6)*exp(-0.190),
                                                        TRUE ~ NA_real_))

所需输出

        Base_Past AADTMAJ L predicted_value
1 HSM-RUR2U-KABCO     100 1      0.02671733
2 HSM-RUR2U-KABCO     100 1      0.02671733
3   HSM-RUR4-KABC     100 1      0.03163652
4  HSM-RUR4-KABCO     100 1      0.03299356

i am writing a package for folks who want to predict values base on AADTMAJ, L, and Base_Past. The function provides two options 1) allow the user to enter there own regression coefficients, or 2) provide the user with pre defined coefficients. However, i have not been able to use return() correctly .

input data

data=data.frame(Base_Past=c("HSM-RUR2U-KABCO",
                            "HSM-RUR2U-KABCO",
                            "HSM-RUR4-KABC",
                            "HSM-RUR4-KABCO"),
                AADTMAJ=c(100,100,100,100),
                L=c(1,1,1,1)
)

input custom regression coefficients

custom.spf=data.frame(Base_Past=c("HSM-RUR2U-KABCO","HSM-RUR2U-KABC"), a=c(-0.312,-0.19))

define helper function

helper_function = function (data, Base_Past=FALSE, override=custom.spf){
  if (is.data.frame(override)){
    for (j in 1:nrow(override)){
      for (i in 1:nrow(data)){
        if(data[i, ]$Base_Past==override[j, ]$Base_Past){
          output=as.numeric(data[i, ]$AADTMAJ*data[i, ]$L*365*10^(-6)*exp(override[j, ]$a))
          return(output)} else{
            if(data[i, ]$Base_Past=="HSM-RUR4-KABCO") {a=-0.101}
            if(data[i, ]$Base_Past=="HSM-RUR4-KABC") {a=-0.143}
            output=as.numeric(data[i, ]$AADTMAJ*data[i, ]$L*365*10^(-6)*exp(a))
            return(output)
          } 
      }
    }
  }
  
  else if (!is.data.frame(override)){
    if(Base_Past=="HSM-RUR4-KABCO") {a=-0.101}
    if(Base_Past=="HSM-RUR4-KABC") {a=-0.143}
    output=as.numeric(data[i, ]$AADTMAJ*data[i, ]$L*365*10^(-6)*exp(a))
    return(output)
  }
}

run

(data %>% dplyr::rowwise() %>% dplyr::mutate(predicted_value = helper_function(data = data, override=custom.spf)))[,4]



Output

# A tibble: 4 x 1
# Rowwise: 
  predicted_value
            <dbl>
1          0.0267
2          0.0267
3          0.0267
4          0.0267

alternative

data %>% dplyr::mutate(predicted_value=dplyr::case_when(Base_Past =="HSM-RUR4-KABCO" ~AADTMAJ*L*365*10^(-6)*exp(-0.101),
                                                        Base_Past=="HSM-RUR4-KABC" ~AADTMAJ*L*365*10^(-6)*exp(-0.143),
                                                        Base_Past=="HSM-RUR2U-KABCO" ~AADTMAJ*L*365*10^(-6)*exp(-0.312),
                                                        Base_Past=="HSM-RUR2U-KABC" ~AADTMAJ*L*365*10^(-6)*exp(-0.190),
                                                        TRUE ~ NA_real_))

desired output

        Base_Past AADTMAJ L predicted_value
1 HSM-RUR2U-KABCO     100 1      0.02671733
2 HSM-RUR2U-KABCO     100 1      0.02671733
3   HSM-RUR4-KABC     100 1      0.03163652
4  HSM-RUR4-KABCO     100 1      0.03299356

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

请恋爱 2025-01-20 18:21:31

该功能以及您对它的使用存在几个问题。自我第一批评论以来,问题列表中值得注意的是:

  • 您在 rowwise 管道中调用它,但随后传递 data=data,这意味着它是 <忽略管道中的数据,而是着眼于整个事情。您可以改为使用 data=cur_data() (因为它位于 mutate 内部,所以这是有效的,因为 cur_data() 是由 定义的code>dplyr 对于类似这样的情况)。

  • 假设 custom.spf 已定义且可用,您的 helper_function 定义不明确。函数依赖于未显式传递给它的外部变量的存在,这使得函数变得脆弱并且很难进行故障排除。例如,如果在调用环境中未定义 custom.spf,则此函数将失败,并找不到对象“custom.spf”。相反,我认为你可以使用:

    helper_function <- function(..., override=NA) {
      if (isTRUE(is.na(override)) && 存在("custom.spf")) {
        message(“找到‘custom.spf’,将其用作‘覆盖’”)
        覆盖<-custom.spf
      }
      ...
    }
    

    我仍然对此并不完全兴奋,但至少它不会太快失败,并且它所做的事情很详细。

  • 如果以编程方式使用1:nrow(.)可能有一点风险。也就是说,如果由于某种原因,其中一个输入有 0 行(也许 custom.spf 没有任何可覆盖的内容),那么 1:nrow(.) 逻辑上应该不执行任何操作而是会对不存在的行进行两次迭代。也就是说,如果 nrow(.) 为 0,那么请注意 1:0 返回 c(1, 0),这显然不是“什么都不做”。相反,请使用 seq_len(nrow(.)),因为 seq_len(0) 返回 integer(0),这正是我们想要的。

  • 这里没有理由使用rowwise(),并且应尽可能避免使用它。 (它做得很好,当确实有必要时,它效果很好。但是一次迭代一行的性能损失可能很大,特别是对于较大的数据。)

您尝试做的一些事情可以通过学习合并/连接方法来简化。关于合并/连接的两个非常好的参考是:如何连接(合并)数据框(内部、外部、左、右), INNER JOIN、LEFT JOIN、RIGHT JOIN 有什么区别和完全加入?

此外,您的工作的很大一部分似乎是为方程的 a 分配一个合理的值。您的内部代码(寻找 "-KABCO""-KABC")看起来确实应该是另一个默认值框架。

这里有一个建议的helper_function,它会稍微改变一些事情。它采用 Base_PastAADTMAJL 作为强制参数,然后合并/连接零个或多个帧以找到合适的值对于等式中的a

helper_function <- function(Base_Past, AADTMAJ, L, ...) {
  stopifnot(
    length(Base_Past) == length(AADTMAJ),
    length(Base_Past) == length(L)
  )
  defaults <- data.frame(Base_Past = c("HSM-RUR4-KABCO", "HSM-RUR4-KABC"), a = c(-0.101, -0.143))
  frames <- c(list(defaults), list(...))
  a <- rep(NA, length(Base_Past))
  tmpdat <- data.frame(row = seq_along(Base_Past), Base_Past = Base_Past, a = a)
  for (frame in frames) {
    tmpdat <- merge(tmpdat, frame, by = "Base_Past", suffixes = c("", ".y"),
                    all.x = TRUE, sort = FALSE)
    tmpdat$a <- ifelse(is.na(tmpdat$a), tmpdat$a.y, tmpdat$a)
    tmpdat$a.y <- NULL
  }
  tmpdat <- tmpdat[order(tmpdat$row),]
  AADTMAJ * L * 365 * 10^(-6) * exp(tmpdat$a)
}

前提是您在函数中查找 a 的“默认”值实际上与在 override 变量中查找它们相同。我本可以为您提供单个查找字典的 override= 参数,但有时拥有“一个或多个” 类型的参数很有用:也许您有更多比一帧具有其他 a 值,并且您可能希望一次使用它们。这将按照您对单个的期望工作,但如果您有多个,也许是 custom.spfcustom.spf,这将工作(通过在调用时的 L 参数)。

我选择保留函数简单基 R 的内部结构有几个原因,但没有什么特别重要的。可以进行 dplyr 化的部分位于 for(帧中的帧)循环内。

data %>%
  mutate(a = helper_function(Base_Past, AADTMAJ, L, custom.spf))
#         Base_Past AADTMAJ L          a
# 1 HSM-RUR2U-KABCO     100 1 0.02671733
# 2 HSM-RUR2U-KABCO     100 1 0.02671733
# 3   HSM-RUR4-KABC     100 1 0.03163652
# 4  HSM-RUR4-KABCO     100 1 0.03299356

如果您愿意,该函数应该在分组(group_byrowwise)内干净地运行,但当然没有必要执行您最初要求的操作。

The function and your use of it have several problems. Notable on the list of problems since my first batch of comments:

  • You call it within a rowwise pipe but then pass data=data, which means that it is ignoring the data coming in the pipe and instead looking at the whole thing. You might instead use data=cur_data() (since it is inside of a mutate, this works, as cur_data() is defined by dplyr for situations something like this).

  • Your helper_function is ill-defined by assuming that custom.spf is defined and available. Having a function rely on the presence of external variables not explicitly passed to it makes it fragile and can be rather difficult to troubleshoot. If for instance custom.spf were not defined in the calling environment, then this function will fail with object 'custom.spf' not found. Instead, I think you could use:

    helper_function <- function(..., override=NA) {
      if (isTRUE(is.na(override)) && exists("custom.spf")) {
        message("found 'custom.spf', using it as 'override'")
        override <- custom.spf
      }
      ...
    }
    

    I'm not totally thrilled with this still, but at least it won't fail too quickly, and is verbose in what it is doing.

  • Using 1:nrow(.) can be a little risky if used programmatically. That is, if for some reason one of the inputs has 0 rows (perhaps custom.spf has nothing to override), then 1:nrow(.) should logically do nothing but instead will iterate twice over rows that do not exist. That is, if nrow(.) is 0, then note that 1:0 returns c(1, 0), which is clearly not "do nothing". Instead, use seq_len(nrow(.)), as seq_len(0) returns integer(0), which is what we would want.

  • There is no reason to use rowwise() here, and its use should be avoided whenever possible. (It does what it does very well, and when it is truly necessary, it works great. But the performance penalty for iterating one row at a time can be significant, especially for larger data.)

Some of what you are trying to do can be simplified by learning about merge/join methods. Two really good references for merge/join are: How to join (merge) data frames (inner, outer, left, right), What's the difference between INNER JOIN, LEFT JOIN, RIGHT JOIN and FULL JOIN?.

Further, it seems as if a significant portion of your effort is to assign a reasonable value to a for your equation. Your inner code (looking for "-KABCO" and "-KABC") looks like it really should be yet another frame of default values.

Here's a suggested helper_function that changes things slightly. It takes as mandatory arguments Base_Past, AADTMAJ, and L, and then zero or more frames to merge/join in order to find an appropriate value for a in the equation.

helper_function <- function(Base_Past, AADTMAJ, L, ...) {
  stopifnot(
    length(Base_Past) == length(AADTMAJ),
    length(Base_Past) == length(L)
  )
  defaults <- data.frame(Base_Past = c("HSM-RUR4-KABCO", "HSM-RUR4-KABC"), a = c(-0.101, -0.143))
  frames <- c(list(defaults), list(...))
  a <- rep(NA, length(Base_Past))
  tmpdat <- data.frame(row = seq_along(Base_Past), Base_Past = Base_Past, a = a)
  for (frame in frames) {
    tmpdat <- merge(tmpdat, frame, by = "Base_Past", suffixes = c("", ".y"),
                    all.x = TRUE, sort = FALSE)
    tmpdat$a <- ifelse(is.na(tmpdat$a), tmpdat$a.y, tmpdat$a)
    tmpdat$a.y <- NULL
  }
  tmpdat <- tmpdat[order(tmpdat$row),]
  AADTMAJ * L * 365 * 10^(-6) * exp(tmpdat$a)
}

The premise is that you looking for "default" values of a in your function is really the same as looking them up in your override variable. I could have given you the override= argument for a single lookup dictionary, but it is sometimes useful to have a "one or more" type of argument: perhaps you have more than one frame with other values for a, and you may want to use them all at once. This will work as you desired for a single, but if you have multiple, perhaps custom.spf and custom.spf, this would work (by adding all of them after the L argument when called).

I chose to keep the internals of the function simple base R for a few reasons, nothing that stands out as critical. The portion that could be dplyr-ized is within the for (frame in frames) loop.

data %>%
  mutate(a = helper_function(Base_Past, AADTMAJ, L, custom.spf))
#         Base_Past AADTMAJ L          a
# 1 HSM-RUR2U-KABCO     100 1 0.02671733
# 2 HSM-RUR2U-KABCO     100 1 0.02671733
# 3   HSM-RUR4-KABC     100 1 0.03163652
# 4  HSM-RUR4-KABCO     100 1 0.03299356

The function should operate cleanly within grouping (group_by or rowwise) if you desire, but it is certainly not necessary to do what you asked originally.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文