如何解决 R 中 for 循环的问题
我正在为想要根据 AADTMAJ、L 和 Base_Past 预测值的人们编写一个包。该函数提供两个选项:1)允许用户输入自己的回归系数,或2)为用户提供预定义的系数。但是,我无法正确使用 return() 。
输入数据
data=data.frame(Base_Past=c("HSM-RUR2U-KABCO",
"HSM-RUR2U-KABCO",
"HSM-RUR4-KABC",
"HSM-RUR4-KABCO"),
AADTMAJ=c(100,100,100,100),
L=c(1,1,1,1)
)
输入自定义回归系数
custom.spf=data.frame(Base_Past=c("HSM-RUR2U-KABCO","HSM-RUR2U-KABC"), a=c(-0.312,-0.19))
定义辅助函数
helper_function = function (data, Base_Past=FALSE, override=custom.spf){
if (is.data.frame(override)){
for (j in 1:nrow(override)){
for (i in 1:nrow(data)){
if(data[i, ]$Base_Past==override[j, ]$Base_Past){
output=as.numeric(data[i, ]$AADTMAJ*data[i, ]$L*365*10^(-6)*exp(override[j, ]$a))
return(output)} else{
if(data[i, ]$Base_Past=="HSM-RUR4-KABCO") {a=-0.101}
if(data[i, ]$Base_Past=="HSM-RUR4-KABC") {a=-0.143}
output=as.numeric(data[i, ]$AADTMAJ*data[i, ]$L*365*10^(-6)*exp(a))
return(output)
}
}
}
}
else if (!is.data.frame(override)){
if(Base_Past=="HSM-RUR4-KABCO") {a=-0.101}
if(Base_Past=="HSM-RUR4-KABC") {a=-0.143}
output=as.numeric(data[i, ]$AADTMAJ*data[i, ]$L*365*10^(-6)*exp(a))
return(output)
}
}
运行
(data %>% dplyr::rowwise() %>% dplyr::mutate(predicted_value = helper_function(data = data, override=custom.spf)))[,4]
输出
# A tibble: 4 x 1
# Rowwise:
predicted_value
<dbl>
1 0.0267
2 0.0267
3 0.0267
4 0.0267
替代
data %>% dplyr::mutate(predicted_value=dplyr::case_when(Base_Past =="HSM-RUR4-KABCO" ~AADTMAJ*L*365*10^(-6)*exp(-0.101),
Base_Past=="HSM-RUR4-KABC" ~AADTMAJ*L*365*10^(-6)*exp(-0.143),
Base_Past=="HSM-RUR2U-KABCO" ~AADTMAJ*L*365*10^(-6)*exp(-0.312),
Base_Past=="HSM-RUR2U-KABC" ~AADTMAJ*L*365*10^(-6)*exp(-0.190),
TRUE ~ NA_real_))
所需输出
Base_Past AADTMAJ L predicted_value
1 HSM-RUR2U-KABCO 100 1 0.02671733
2 HSM-RUR2U-KABCO 100 1 0.02671733
3 HSM-RUR4-KABC 100 1 0.03163652
4 HSM-RUR4-KABCO 100 1 0.03299356
i am writing a package for folks who want to predict values base on AADTMAJ, L, and Base_Past. The function provides two options 1) allow the user to enter there own regression coefficients, or 2) provide the user with pre defined coefficients. However, i have not been able to use return() correctly .
input data
data=data.frame(Base_Past=c("HSM-RUR2U-KABCO",
"HSM-RUR2U-KABCO",
"HSM-RUR4-KABC",
"HSM-RUR4-KABCO"),
AADTMAJ=c(100,100,100,100),
L=c(1,1,1,1)
)
input custom regression coefficients
custom.spf=data.frame(Base_Past=c("HSM-RUR2U-KABCO","HSM-RUR2U-KABC"), a=c(-0.312,-0.19))
define helper function
helper_function = function (data, Base_Past=FALSE, override=custom.spf){
if (is.data.frame(override)){
for (j in 1:nrow(override)){
for (i in 1:nrow(data)){
if(data[i, ]$Base_Past==override[j, ]$Base_Past){
output=as.numeric(data[i, ]$AADTMAJ*data[i, ]$L*365*10^(-6)*exp(override[j, ]$a))
return(output)} else{
if(data[i, ]$Base_Past=="HSM-RUR4-KABCO") {a=-0.101}
if(data[i, ]$Base_Past=="HSM-RUR4-KABC") {a=-0.143}
output=as.numeric(data[i, ]$AADTMAJ*data[i, ]$L*365*10^(-6)*exp(a))
return(output)
}
}
}
}
else if (!is.data.frame(override)){
if(Base_Past=="HSM-RUR4-KABCO") {a=-0.101}
if(Base_Past=="HSM-RUR4-KABC") {a=-0.143}
output=as.numeric(data[i, ]$AADTMAJ*data[i, ]$L*365*10^(-6)*exp(a))
return(output)
}
}
run
(data %>% dplyr::rowwise() %>% dplyr::mutate(predicted_value = helper_function(data = data, override=custom.spf)))[,4]
Output
# A tibble: 4 x 1
# Rowwise:
predicted_value
<dbl>
1 0.0267
2 0.0267
3 0.0267
4 0.0267
alternative
data %>% dplyr::mutate(predicted_value=dplyr::case_when(Base_Past =="HSM-RUR4-KABCO" ~AADTMAJ*L*365*10^(-6)*exp(-0.101),
Base_Past=="HSM-RUR4-KABC" ~AADTMAJ*L*365*10^(-6)*exp(-0.143),
Base_Past=="HSM-RUR2U-KABCO" ~AADTMAJ*L*365*10^(-6)*exp(-0.312),
Base_Past=="HSM-RUR2U-KABC" ~AADTMAJ*L*365*10^(-6)*exp(-0.190),
TRUE ~ NA_real_))
desired output
Base_Past AADTMAJ L predicted_value
1 HSM-RUR2U-KABCO 100 1 0.02671733
2 HSM-RUR2U-KABCO 100 1 0.02671733
3 HSM-RUR4-KABC 100 1 0.03163652
4 HSM-RUR4-KABCO 100 1 0.03299356
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
该功能以及您对它的使用存在几个问题。自我第一批评论以来,问题列表中值得注意的是:
您在
rowwise
管道中调用它,但随后传递data=data
,这意味着它是 <忽略管道中的数据,而是着眼于整个事情。您可以改为使用data=cur_data()
(因为它位于mutate
内部,所以这是有效的,因为cur_data()
是由定义的code>dplyr
对于类似这样的情况)。假设
custom.spf
已定义且可用,您的helper_function
定义不明确。函数依赖于未显式传递给它的外部变量的存在,这使得函数变得脆弱并且很难进行故障排除。例如,如果在调用环境中未定义custom.spf
,则此函数将失败,并找不到对象“custom.spf”
。相反,我认为你可以使用:我仍然对此并不完全兴奋,但至少它不会太快失败,并且它所做的事情很详细。
如果以编程方式使用
1:nrow(.)
可能有一点风险。也就是说,如果由于某种原因,其中一个输入有 0 行(也许custom.spf
没有任何可覆盖的内容),那么1:nrow(.)
逻辑上应该不执行任何操作而是会对不存在的行进行两次迭代。也就是说,如果nrow(.)
为 0,那么请注意1:0
返回c(1, 0)
,这显然不是“什么都不做”。相反,请使用seq_len(nrow(.))
,因为seq_len(0)
返回integer(0)
,这正是我们想要的。这里没有理由使用
rowwise()
,并且应尽可能避免使用它。 (它做得很好,当确实有必要时,它效果很好。但是一次迭代一行的性能损失可能很大,特别是对于较大的数据。)您尝试做的一些事情可以通过学习合并/连接方法来简化。关于合并/连接的两个非常好的参考是:如何连接(合并)数据框(内部、外部、左、右), INNER JOIN、LEFT JOIN、RIGHT JOIN 有什么区别和完全加入?。
此外,您的工作的很大一部分似乎是为方程的
a
分配一个合理的值。您的内部代码(寻找"-KABCO"
和"-KABC"
)看起来确实应该是另一个默认值框架。这里有一个建议的
helper_function
,它会稍微改变一些事情。它采用Base_Past
、AADTMAJ
和L
作为强制参数,然后合并/连接零个或多个帧以找到合适的值对于等式中的a
。前提是您在函数中查找
a
的“默认”值实际上与在override
变量中查找它们相同。我本可以为您提供单个查找字典的override=
参数,但有时拥有“一个或多个” 类型的参数很有用:也许您有更多比一帧具有其他a
值,并且您可能希望一次使用它们。这将按照您对单个的期望工作,但如果您有多个,也许是custom.spf
和custom.spf
,这将工作(通过在调用时的L
参数)。我选择保留函数简单基 R 的内部结构有几个原因,但没有什么特别重要的。可以进行 dplyr 化的部分位于 for(帧中的帧)循环内。
如果您愿意,该函数应该在分组(
group_by
或rowwise
)内干净地运行,但当然没有必要执行您最初要求的操作。The function and your use of it have several problems. Notable on the list of problems since my first batch of comments:
You call it within a
rowwise
pipe but then passdata=data
, which means that it is ignoring the data coming in the pipe and instead looking at the whole thing. You might instead usedata=cur_data()
(since it is inside of amutate
, this works, ascur_data()
is defined bydplyr
for situations something like this).Your
helper_function
is ill-defined by assuming thatcustom.spf
is defined and available. Having a function rely on the presence of external variables not explicitly passed to it makes it fragile and can be rather difficult to troubleshoot. If for instancecustom.spf
were not defined in the calling environment, then this function will fail withobject 'custom.spf' not found
. Instead, I think you could use:I'm not totally thrilled with this still, but at least it won't fail too quickly, and is verbose in what it is doing.
Using
1:nrow(.)
can be a little risky if used programmatically. That is, if for some reason one of the inputs has 0 rows (perhapscustom.spf
has nothing to override), then1:nrow(.)
should logically do nothing but instead will iterate twice over rows that do not exist. That is, ifnrow(.)
is 0, then note that1:0
returnsc(1, 0)
, which is clearly not "do nothing". Instead, useseq_len(nrow(.))
, asseq_len(0)
returnsinteger(0)
, which is what we would want.There is no reason to use
rowwise()
here, and its use should be avoided whenever possible. (It does what it does very well, and when it is truly necessary, it works great. But the performance penalty for iterating one row at a time can be significant, especially for larger data.)Some of what you are trying to do can be simplified by learning about merge/join methods. Two really good references for merge/join are: How to join (merge) data frames (inner, outer, left, right), What's the difference between INNER JOIN, LEFT JOIN, RIGHT JOIN and FULL JOIN?.
Further, it seems as if a significant portion of your effort is to assign a reasonable value to
a
for your equation. Your inner code (looking for"-KABCO"
and"-KABC"
) looks like it really should be yet another frame of default values.Here's a suggested
helper_function
that changes things slightly. It takes as mandatory argumentsBase_Past
,AADTMAJ
, andL
, and then zero or more frames to merge/join in order to find an appropriate value fora
in the equation.The premise is that you looking for "default" values of
a
in your function is really the same as looking them up in youroverride
variable. I could have given you theoverride=
argument for a single lookup dictionary, but it is sometimes useful to have a "one or more" type of argument: perhaps you have more than one frame with other values fora
, and you may want to use them all at once. This will work as you desired for a single, but if you have multiple, perhapscustom.spf
andcustom.spf
, this would work (by adding all of them after theL
argument when called).I chose to keep the internals of the function simple base R for a few reasons, nothing that stands out as critical. The portion that could be
dplyr
-ized is within thefor (frame in frames)
loop.The function should operate cleanly within grouping (
group_by
orrowwise
) if you desire, but it is certainly not necessary to do what you asked originally.