运行多行的R代码，每次都有一个变量以提高可读性

发布于 2025-02-05 17:22:58 字数 3458 浏览 2 评论 0原文

我希望通过查看是否有一种“循环”或“重新运行”的代码行的方法来提高代码的可读性，而这些代码线非常相似，但每次都有一个变量不同。

我的实际数据分析涉及从blme软件包运行许多blmer调用。我的每个分析都有一个因变量，一个自变量（其中有很多），“波浪”变量（由于在3个时间点上收集了数据）和独特的参与者ID作为随机效果。

我正在尝试构建许多模型，所有模型都非常相似，但是每个模型都在输入的变量上不同。

在以下代码中，我概述了更多详细信息，构建了一个新的，虚拟的数据文件，并试图重新创建与实际文件中类似的模型。

该代码在我的真实数据和虚拟数据中没有问题。我想提请注意这里的关注是，即使只有3种型号（如下文我的示例中），该代码开始变得漫长而重复。

##test script##
library(dplyr)
library(tidyverse)
library(blme)
#packages loaded - I'm not sure these three are exactly needed, I just loaded
#dplyr and tidyverse incase...but blme is for the Bayesian models coming later
#everything below worked on RStudio on my end but, I like I say, I don't 
#know if that is because of the above packages or not...

##build a file
DV0 <- c(100, 50, 75, 80, 20, 30) #let's say performance on a soccer task at time 1 - max 100
DV1 <- c(100, 60, 80, 80, 25, 40) #performance on soccer task at time 2
DV2 <- c(95, 55, 70, 70, 20, 35) #performance on soccer task at time 3
IV1.0 <- c(90, 60, 65, 75, 40, 50) #score on cognitive task A at time 1 - max 100
IV1.1 <- c(95, 70, 75, 80, 50, 70) #score on cog task A at time 2 
IV1.2 <- c(90, 55, 60, 70, 45, 60) #score on cog task A at time 3
IV2.0 <- c(10, 40, 50, 60, 20, 25) #score on cognitive task B at time 1 - max 100
IV2.1 <- c(20, 50, 60, 75, 35, 35) #score on cog task B at time 2
IV2.2 <- c(15, 40, 40, 55, 25, 25) #score on cos task B at time 3
id <- c("Jon", "Sara", "Lisa", "Tim", "Joe", "Paul")

##create a data frame before pivot to a better format for longitudinal data
df <- data.frame(DV0, DV1, DV2, IV1.0, IV1.1, IV1.2, IV2.0, IV2.1, IV2.2,
                 id)
df.long <- long_panel(df, begin = 0, end = 2, label_location = "end")

#now onto the main analyses 
#here I want to use "blmer" from "blme" package to understand how performance
#on the soccer task first is affected by time alone (model1 below). 
#Next,I want to check whether adding performance on cognitive task A
#influences performance (model2 below), before running the same analyses but with
#cognitive task B (model3 below) - in this example I have just two cognitive 
#tasks, but in my real work I have many more IVs to test (let's in this case 
#just say it would be more cognitive tasks). Final thing I plan to add an 
#individual slope and intercept based on the id variable

#time alone and soccer task performance
model1 <- blmer(DV ~ wave + (1 | id), data = df.long, REML = FALSE,
                fixef.prior = normal)
summary(model1)

#new experimental model with cognitive tasks A performance added
model2 <- blmer(DV ~ IV1. + wave + (1 | id), data = df.long, REML = FALSE,
                fixef.prior = normal)
summary(model2)
anova(model1, model2)

#a similar experimental model with cogntive tasks B performance instead of A
model3 <- blmer(DV ~ IV2. + wave + (1 | id), data = df.long, REML = FALSE,
                fixef.prior = normal)
summary(model3)
anova(model1, model3)

#in the real data I then have many more models with IV1. or IV2. changed for 
#another independent variable (e.g., IV3. or IV4.) and as a result the code
#is very long. I'm wanting to know, can the above be put together in fewer 
#lines of code. What I've been reading is maybe that I could loop somewhere
#so that "IV.*" is replaced each time?

#thanks in advance for any help!

因此，如果您有任何方法可以在此示例中基本运行Model1，Model2和Model3的代码3，如果代码较少行，那就太好了。

原文

I am looking to improve the readability of my code by seeing if there is a way to "loop" or "re-run" lines of code that are very similar but differ by a single variable each time.

My actual data analyses involves running a number of blmer calls from the blme package. Each of my analyses has a dependent variable, an independent variable (of which there are many), a "wave" variable (as data was collected over 3 timepoints), and unique participant id as a random effect.

I'm trying to build a number of models, all of which are very similar, but each differs on what is entered as the independent variable.

In the below code, I have outlined some more details, built a new, fictitious, data file, and tried to recreate models similar to those in my actual file.

The code runs without problem on my real data and here in the fictitious data. What I'd like to draw attention to here is how even with just 3 models included (as is the case in my example below) the code begins to become long and repetitive.

##test script##
library(dplyr)
library(tidyverse)
library(blme)
#packages loaded - I'm not sure these three are exactly needed, I just loaded
#dplyr and tidyverse incase...but blme is for the Bayesian models coming later
#everything below worked on RStudio on my end but, I like I say, I don't 
#know if that is because of the above packages or not...

##build a file
DV0 <- c(100, 50, 75, 80, 20, 30) #let's say performance on a soccer task at time 1 - max 100
DV1 <- c(100, 60, 80, 80, 25, 40) #performance on soccer task at time 2
DV2 <- c(95, 55, 70, 70, 20, 35) #performance on soccer task at time 3
IV1.0 <- c(90, 60, 65, 75, 40, 50) #score on cognitive task A at time 1 - max 100
IV1.1 <- c(95, 70, 75, 80, 50, 70) #score on cog task A at time 2 
IV1.2 <- c(90, 55, 60, 70, 45, 60) #score on cog task A at time 3
IV2.0 <- c(10, 40, 50, 60, 20, 25) #score on cognitive task B at time 1 - max 100
IV2.1 <- c(20, 50, 60, 75, 35, 35) #score on cog task B at time 2
IV2.2 <- c(15, 40, 40, 55, 25, 25) #score on cos task B at time 3
id <- c("Jon", "Sara", "Lisa", "Tim", "Joe", "Paul")

##create a data frame before pivot to a better format for longitudinal data
df <- data.frame(DV0, DV1, DV2, IV1.0, IV1.1, IV1.2, IV2.0, IV2.1, IV2.2,
                 id)
df.long <- long_panel(df, begin = 0, end = 2, label_location = "end")

#now onto the main analyses 
#here I want to use "blmer" from "blme" package to understand how performance
#on the soccer task first is affected by time alone (model1 below). 
#Next,I want to check whether adding performance on cognitive task A
#influences performance (model2 below), before running the same analyses but with
#cognitive task B (model3 below) - in this example I have just two cognitive 
#tasks, but in my real work I have many more IVs to test (let's in this case 
#just say it would be more cognitive tasks). Final thing I plan to add an 
#individual slope and intercept based on the id variable

#time alone and soccer task performance
model1 <- blmer(DV ~ wave + (1 | id), data = df.long, REML = FALSE,
                fixef.prior = normal)
summary(model1)

#new experimental model with cognitive tasks A performance added
model2 <- blmer(DV ~ IV1. + wave + (1 | id), data = df.long, REML = FALSE,
                fixef.prior = normal)
summary(model2)
anova(model1, model2)

#a similar experimental model with cogntive tasks B performance instead of A
model3 <- blmer(DV ~ IV2. + wave + (1 | id), data = df.long, REML = FALSE,
                fixef.prior = normal)
summary(model3)
anova(model1, model3)

#in the real data I then have many more models with IV1. or IV2. changed for 
#another independent variable (e.g., IV3. or IV4.) and as a result the code
#is very long. I'm wanting to know, can the above be put together in fewer 
#lines of code. What I've been reading is maybe that I could loop somewhere
#so that "IV.*" is replaced each time?

#thanks in advance for any help!

So, if you have any ways to essentially run the code for model1, model2, and model3 in this example if fewer lines of code, that would be great.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

小帐篷 2025-02-12 17:22:58

您可以创建一个接收自变量作为字符串的函数，加上DF和其他选项，以及Leverages as.formula（）。然后使用lapply（）将功能应用于每个自变量。在运行仅波浪模型时，您可以将“”用作“自变量”（即模型1）。

get_model <- function(ind_var, df, REML = FALSE,fixef.prior = "normal",...) {
  f <- as.formula(paste0("DV ~ ",ind_var, " + wave + (1 | id)"))
  blmer(f, data = df, REML = REML,fixef.prior = fixef.prior,...)
}

现在，获取一个名为models的列表，

models = lapply(c("", "IV1.", "IV2."), get_model, df=df.long)

您可以运行您喜欢的任何ANOVA，如：

anova(models[[1]], models[[3]])

输出：

Data: df
Models:
models[[1]]: DV ~ +wave + (1 | id)
models[[3]]: DV ~ IV2. + wave + (1 | id)
            npar    AIC    BIC  logLik deviance  Chisq Df Pr(>Chisq)   
models[[1]]    4 141.41 144.98 -66.707   133.41                        
models[[3]]    5 133.12 137.57 -61.560   123.12 10.296  1   0.001333 **
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

还有另一个选项，它是制作df.long，甚至“更长”，然后通过分组变量估算模型。此操作的示例

library(data.table)
setDT(df.long)

df.longer=melt(df.long, measure=c("IV1.", "IV2."),variable.name = "ind_var")

rbind(
  df.long[, .(model=list(blmer(DV~wave+(1|id), REML=F, fixef.prior="normal")))][, ind_var:="None"],
  df.longer[, .(model=list(blmer(DV~value+wave+(1|id), REML=F, fixef.prior="normal"))), ind_var]
)

这是使用data.table输出进行

            model ind_var
           <list>  <fctr>
1: <blmerMod[14]>    None
2: <blmerMod[14]>    IV1.
3: <blmerMod[14]>    IV2.

You can create a function that receives the independent variable as a string, plus the df, and other options, and leverages as.formula(). Then apply the function to each of the your independent variables using lapply(). You can use "" as the "independent variable", when running the wave-only model (i.e. model 1).

get_model <- function(ind_var, df, REML = FALSE,fixef.prior = "normal",...) {
  f <- as.formula(paste0("DV ~ ",ind_var, " + wave + (1 | id)"))
  blmer(f, data = df, REML = REML,fixef.prior = fixef.prior,...)
}

Now get a list called models

models = lapply(c("", "IV1.", "IV2."), get_model, df=df.long)

You can run any anova you like, like this:

anova(models[[1]], models[[3]])

Output:

Data: df
Models:
models[[1]]: DV ~ +wave + (1 | id)
models[[3]]: DV ~ IV2. + wave + (1 | id)
            npar    AIC    BIC  logLik deviance  Chisq Df Pr(>Chisq)   
models[[1]]    4 141.41 144.98 -66.707   133.41                        
models[[3]]    5 133.12 137.57 -61.560   123.12 10.296  1   0.001333 **
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

There is another option, which is to make df.long even "longer", and then estimate the models by the grouping variable. Here is an example of doing that with data.table

library(data.table)
setDT(df.long)

df.longer=melt(df.long, measure=c("IV1.", "IV2."),variable.name = "ind_var")

rbind(
  df.long[, .(model=list(blmer(DV~wave+(1|id), REML=F, fixef.prior="normal")))][, ind_var:="None"],
  df.longer[, .(model=list(blmer(DV~value+wave+(1|id), REML=F, fixef.prior="normal"))), ind_var]
)

Output is a data.table of models

            model ind_var
           <list>  <fctr>
1: <blmerMod[14]>    None
2: <blmerMod[14]>    IV1.
3: <blmerMod[14]>    IV2.

回复收藏 0 原文

~没有更多了~

关于作者

客…行舟

暂无简介

文章

26 人气

关注发私信

友情链接

文江博客

运行多行的R代码，每次都有一个变量以提高可读性

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

十二

飞烟轻若梦

OPleyuhuo

wxb0109

旧城空念

-小熊_

友情链接

运行多行的R代码，每次都有一个变量以提高可读性

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

十二

飞烟轻若梦

OPleyuhuo

wxb0109

旧城空念

-小熊_

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。