如何使用R将Summarize()函数的结果放入数据框中?

发布于 2025-01-28 00:35:38 字数 3057 浏览 5 评论 0原文

这个问题来自(如何将summarize()函数的结果放入r 中)

,我认为我认为我没有很好地传达我的问题。 因此,我添加了更多详细信息。

我做了一个最小的可重现示例,但是我的真实数据确实很大

a_p_ <-c(0.1, 0.3, 0.03, 0.03)
b_p_ <-c(0.2, 0.003, 0.1, 0.00001)
c_2<-c(1,2,5,23)
c_p_<-c(0.001, 0.002,0.002,0.00001)
results_1<-data.frame(a_p_,b_p_,c_2,c_p_)

a_p_ <-c(0.3, 0.02, 0.43, 0.44)
b_p_ <-c(0.00002, 0.3, 0.8, 0.005)
c_2 <-c(88,4,55,88)
c_p_<-c(0.1, 0.002,0.002,0.1)

results_2<-data.frame(a_p_,b_p_,c_2,c_p_)

,所以我有两个数据集。一个是“结果_1”,另一个是“结果_2” 但是,这只是一个可再现的数据集。 在我的真实数据集中,我有200个结果文件。 (从“结果_1”到“结果_200”)

然后,我想创建新的dataframe(数据帧名称为type1error) 其中包含以下示例。

更具体地说,我希望这是我的新DataFrame(type1error)的第一行

>   results_1 %>%
+     summarise(across(contains("_p_"), ~ mean(.x > 0.05)))
  a_p_ b_p_ c_p_
1  0.5  0.5    0

,这是我的数据框架的第二行(type 1错误),

> results_2 %>%
+     summarise(across(contains("_p_"), ~ mean(.x > 0.05)))
  a_p_ b_p_ c_p_
1 0.75  0.5  0.5

所以我做的是..

# make empty holder

type1error<-as.data.frame(matrix(nrow = 2))

for(i in 1:2){
  # read the data 
  if(i==1){
    results<-results_1
  }
  if(i==2){
    results<-results_2
  }
  

  
  # mean() You can use mean() to get the proportion of TRUE of a logical vector.
  type1error[i,]<-results %>%
    summarise(across(contains("_p_"), ~ mean(.x > 0.05)))
  
  type1error$conditions[i] <- i 
  
}

但是我收到了这样的警告消息,结果似乎不是我所期望的 (总结每一行的结果)

Warning messages:
1: In `[<-.data.frame`(`*tmp*`, i, , value = list(a_p_ = 0.5, b_p_ = 0.5,  :
  provided 3 variables to replace 2 variables
2: In `[<-.data.frame`(`*tmp*`, i, , value = list(a_p_ = 0.75, b_p_ = 0.5,  :
  provided 3 variables to replace 2 variables

如何解决此问题?

以下代码不是此示例数据集,而是我的真实数据集 会产生相同的错误。

#FYI, Not reproducible, but the code that I did use for my real, huge,data is as follows:

ncond<-200

#empty holder 

type1error<-as.data.frame(matrix(nrow = ncond))

for(i in 1:ncond){
# read the data 
results <- read.csv(paste0("model_results/results_",i,".csv"))
 

# mean() You can use mean() to get the proportion of TRUE of a logical vector.
type1error[i,]<-results %>%
  summarise(across(contains("_p_"), ~ mean(.x > 0.05)))

type1error$conditions[i] <- i 

}
# one csv file in type 1 error rate 
# fixed
write.csv(type1error,"type1error/type1error.csv")

#and this code chunk did not work well. 

我感谢上一个问题页中的所有答案!

在上一个问题网页的答案中,这全是 “结果_1”和“结果_2”,因为我可重复的示例只有两个数据集。

但是,实际上,我有200个数据集 (从“结果_1”到“结果_200” ..),

我必须制作一个新的数据框架,而不是列表。

This question is from (how to put the results of summarise() function into the dataframe in r)

in the previous question, I think I did not convey my question well.
so, I added more details.

I made a minimal reproducible example, but my real data is really huge

a_p_ <-c(0.1, 0.3, 0.03, 0.03)
b_p_ <-c(0.2, 0.003, 0.1, 0.00001)
c_2<-c(1,2,5,23)
c_p_<-c(0.001, 0.002,0.002,0.00001)
results_1<-data.frame(a_p_,b_p_,c_2,c_p_)

a_p_ <-c(0.3, 0.02, 0.43, 0.44)
b_p_ <-c(0.00002, 0.3, 0.8, 0.005)
c_2 <-c(88,4,55,88)
c_p_<-c(0.1, 0.002,0.002,0.1)

results_2<-data.frame(a_p_,b_p_,c_2,c_p_)

so, I have two dataset. the one is "results_1" and the other is "results_2"
But, this is just an reproducible dataset.
In my real dataset, I have 200 results files.
(from "results_1" to "results_200")

and then, I want to create new dataframe (data frame name is type1error)
that contains the following examples.

More specific, I want this to be the first row of my new dataframe (type1error)

>   results_1 %>%
+     summarise(across(contains("_p_"), ~ mean(.x > 0.05)))
  a_p_ b_p_ c_p_
1  0.5  0.5    0

and this to be my second row of my dataframe (type 1 error)

> results_2 %>%
+     summarise(across(contains("_p_"), ~ mean(.x > 0.05)))
  a_p_ b_p_ c_p_
1 0.75  0.5  0.5

so what I did is..

# make empty holder

type1error<-as.data.frame(matrix(nrow = 2))

for(i in 1:2){
  # read the data 
  if(i==1){
    results<-results_1
  }
  if(i==2){
    results<-results_2
  }
  

  
  # mean() You can use mean() to get the proportion of TRUE of a logical vector.
  type1error[i,]<-results %>%
    summarise(across(contains("_p_"), ~ mean(.x > 0.05)))
  
  type1error$conditions[i] <- i 
  
}

but I got warning message like this, and the results does not seems to be what I was expected
(summarise results for each row)

Warning messages:
1: In `[<-.data.frame`(`*tmp*`, i, , value = list(a_p_ = 0.5, b_p_ = 0.5,  :
  provided 3 variables to replace 2 variables
2: In `[<-.data.frame`(`*tmp*`, i, , value = list(a_p_ = 0.75, b_p_ = 0.5,  :
  provided 3 variables to replace 2 variables

How can I fix this?

The below code is not for this example dataset, but for my real dataset
which generates the same error.

#FYI, Not reproducible, but the code that I did use for my real, huge,data is as follows:

ncond<-200

#empty holder 

type1error<-as.data.frame(matrix(nrow = ncond))

for(i in 1:ncond){
# read the data 
results <- read.csv(paste0("model_results/results_",i,".csv"))
 

# mean() You can use mean() to get the proportion of TRUE of a logical vector.
type1error[i,]<-results %>%
  summarise(across(contains("_p_"), ~ mean(.x > 0.05)))

type1error$conditions[i] <- i 

}
# one csv file in type 1 error rate 
# fixed
write.csv(type1error,"type1error/type1error.csv")

#and this code chunk did not work well. 

I appreciate all the answers in the previous question page!

In the answer from the previous question webpage, it is all for
"results_1" and "results_2",becuase my reproducible example have only two dataset.

However, in reality, I have 200 dataset
(from "results_1" to "results_200"..),

and I have to make a new dataframe, not a list.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

水晶透心 2025-02-04 00:35:38

您可以使用MAP和BIND_ROWS来将列表和输出作为数据框架使用。

MAP(purrr package)获取列表/矢量对其有些函数,然后输出列表,然后bind_rows(dplyr)可以将元素作为数据框架附加。

ResultList <-list(results_1, results_2)

sumit <- function(x) {
  summarise(x, across(contains("_p_"), ~ mean(.x > 0.05)))
}

FinalResult <- map(ResultList, ~sumit(.x))

Type1Error <- bind_rows(FinalResult)

您也可以在映射中作为单线操作:map(结果列表,〜汇总(.x,tocress(contains(“ _ p_”),〜表示(.x&gt; 0.05)))

为了将所有文件放入列表格式,您可以使用地图或lapply。

被编辑为从链接解决方案中包括修改版本,以将CSV文件纳入列表,假设您在R项目目录中包含所有文件的文件夹称为“数据”。

setwd("./Data")
filenames <- list.files(full.names=TRUE)  
ResultList <- lapply(filenames,function(i){
read.csv(i)})

用于将CSV文件读取到列表

You can use map and bind_rows in order to work with a list and output as a dataframe.

Map (purrr package) takes a list/vector does some function to it and then outputs a list, and then bind_rows (dplyr) can append the elements as a dataframe.

ResultList <-list(results_1, results_2)

sumit <- function(x) {
  summarise(x, across(contains("_p_"), ~ mean(.x > 0.05)))
}

FinalResult <- map(ResultList, ~sumit(.x))

Type1Error <- bind_rows(FinalResult)

You can also do it as a one-liner in map: map(ResultList, ~summarise(.x, across(contains("_p_"), ~ mean(.x > 0.05))))

In order to get all of your files into list format you could use map or lapply.

Edited to include modified version from the linked solution to get csv files into a list assuming you have a folder called "Data" in your R project directory that contains all the files.

setwd("./Data")
filenames <- list.files(full.names=TRUE)  
ResultList <- lapply(filenames,function(i){
read.csv(i)})

Solution for reading csv files into a list

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文