在R中的数据框中的数百个变量的计数，条件和星座变量

发布于 2025-02-01 12:02:39 字数 1475 浏览 2 评论 0原文

我正在使用一个数据集，当时我需要评估数百列，以通过行逐行创建新变量。我有三个新变量，一个需要“或”操作员来决定〜100列之间是否有“是”。第二个需要对变量进行计数，我总共有多少个“是”，第三个需要创建一个星座变量，向我显示带有“是”值的变量的名称，所有这些都按了行。我有前两个的代码，但是对于第三个代码，我被卡住了。另外，例如，我仅使用几个变量，但我需要使用〜100个变量。我的代码如下：

#making the data - I am using actually ~100 variables 

test.data <- data.frame(var1 = c("yes", "no", "no", "N/A", NA, NA),
                        var2 = c(NA, NA, "yes", "no", "yes", NA),
                        var3 = c("yes", "yes", "yes", "no", "yes", "N/A"),
                        var4 = c("N/A", "yes", "no", "no", "yes", NA))

# code for the first two variables: is.positive and number.pos - not elegant nor efficient since I #need to work with ~100 vars

final.data <- data.frame(test.data %>%
              mutate(is.positive = ifelse(var1=="yes" | var2=="yes" | var3=="yes" | var4=="yes", 1,
                                       ifelse((is.na(var1) | var1=="N/A") & 
                                                (is.na(var2) | var2=="N/A") & 
                                                (is.na(var3) | var3=="N/A") & 
                                                (is.na(var4) | var4=="N/A"), NA, 0))) %>%
              rowwise() %>%
              mutate(number.pos = sum(c_across(c(var1, var2, var3, var4))=="yes",na.rm=TRUE)))

原文

I am working with a dataset where I need to evaluate hundreds of columns at the time to create new variables with computations by row. I have three new variables, one needs the "or" operator to decide if there is any "yes" across the ~100 columns. The second one needs to count across the variables how many "yes" I have in total, and the third one needs to create a constellation variable that shows me the name of variables with the "yes" value, all of this by row. I have the code for the first two, but for the third one I am stuck. Also, I am using only a few variables for example purposes but I have ~100 variables that I need to use. My code is below:

#making the data - I am using actually ~100 variables 

test.data <- data.frame(var1 = c("yes", "no", "no", "N/A", NA, NA),
                        var2 = c(NA, NA, "yes", "no", "yes", NA),
                        var3 = c("yes", "yes", "yes", "no", "yes", "N/A"),
                        var4 = c("N/A", "yes", "no", "no", "yes", NA))

# code for the first two variables: is.positive and number.pos - not elegant nor efficient since I #need to work with ~100 vars

final.data <- data.frame(test.data %>%
              mutate(is.positive = ifelse(var1=="yes" | var2=="yes" | var3=="yes" | var4=="yes", 1,
                                       ifelse((is.na(var1) | var1=="N/A") & 
                                                (is.na(var2) | var2=="N/A") & 
                                                (is.na(var3) | var3=="N/A") & 
                                                (is.na(var4) | var4=="N/A"), NA, 0))) %>%
              rowwise() %>%
              mutate(number.pos = sum(c_across(c(var1, var2, var3, var4))=="yes",na.rm=TRUE)))

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

囍笑 2025-02-08 12:02:40

您可以通过制作一个列表列来做到这一点，然后从中得出其他值。

library(tidyverse)
test.data <- data.frame(var1 = c("yes", "no", "no", "N/A", NA, NA),
                        var2 = c(NA, NA, "yes", "no", "yes", NA),
                        var3 = c("yes", "yes", "yes", "no", "yes", "N/A"),
                        var4 = c("N/A", "yes", "no", "no", "yes", NA))


nv <- test.data %>% 
  select(var1:var4) %>% 
  names()
out <- test.data %>% 
  rowwise() %>% 
  mutate(which_pos = list(nv[which(c_across(var1:var4) == "yes")]), 
         num.positive = length(which_pos), 
         is.positive = num.positive > 0) 

out
#> # A tibble: 6 × 7
#> # Rowwise: 
#>   var1  var2  var3  var4  which_pos num.positive is.positive
#>   <chr> <chr> <chr> <chr> <list>           <int> <lgl>      
#> 1 yes   <NA>  yes   N/A   <chr [2]>            2 TRUE       
#> 2 no    <NA>  yes   yes   <chr [2]>            2 TRUE       
#> 3 no    yes   yes   no    <chr [2]>            2 TRUE       
#> 4 N/A   no    no    no    <chr [0]>            0 FALSE      
#> 5 <NA>  yes   yes   yes   <chr [3]>            3 TRUE       
#> 6 <NA>  <NA>  N/A   <NA>  <chr [0]>            0 FALSE
out$which_pos
#> [[1]]
#> [1] "var1" "var3"
#> 
#> [[2]]
#> [1] "var3" "var4"
#> 
#> [[3]]
#> [1] "var2" "var3"
#> 
#> [[4]]
#> character(0)
#> 
#> [[5]]
#> [1] "var2" "var3" "var4"
#> 
#> [[6]]
#> character(0)

^由

创建您想要一个变量的普通列，以识别哪些是正面的列，您可以简单地将名称粘贴在一起，以创建一个具有逗号分隔名称的字符串：

library(tidyverse)
test.data <- data.frame(var1 = c("yes", "no", "no", "N/A", NA, NA),
                        var2 = c(NA, NA, "yes", "no", "yes", NA),
                        var3 = c("yes", "yes", "yes", "no", "yes", "N/A"),
                        var4 = c("N/A", "yes", "no", "no", "yes", NA))


nv <- test.data %>% 
  select(var1:var4) %>% 
  names()
out <- test.data %>% 
  rowwise() %>% 
  mutate(which_pos = paste(nv[which(c_across(var1:var4) == "yes")], collapse=","), 
         num.positive = sum(c_across(var1:var4) == "yes", na.rm=TRUE), 
         is.positive = num.positive > 0) 

out
#> # A tibble: 6 × 7
#> # Rowwise: 
#>   var1  var2  var3  var4  which_pos        num.positive is.positive
#>   <chr> <chr> <chr> <chr> <chr>                   <int> <lgl>      
#> 1 yes   <NA>  yes   N/A   "var1,var3"                 2 TRUE       
#> 2 no    <NA>  yes   yes   "var3,var4"                 2 TRUE       
#> 3 no    yes   yes   no    "var2,var3"                 2 TRUE       
#> 4 N/A   no    no    no    ""                          0 FALSE      
#> 5 <NA>  yes   yes   yes   "var2,var3,var4"            3 TRUE       
#> 6 <NA>  <NA>  N/A   <NA>  ""                          0 FALSE

^{由 reprex package （v2.0.1）}

列表列可能在随后的分析中更易于使用，但是comma-eppareated comma-parparewed可变可能更容易用于视觉检查。

You could do it by making a list column for which ones are positive and then deriving the other values from that.

library(tidyverse)
test.data <- data.frame(var1 = c("yes", "no", "no", "N/A", NA, NA),
                        var2 = c(NA, NA, "yes", "no", "yes", NA),
                        var3 = c("yes", "yes", "yes", "no", "yes", "N/A"),
                        var4 = c("N/A", "yes", "no", "no", "yes", NA))


nv <- test.data %>% 
  select(var1:var4) %>% 
  names()
out <- test.data %>% 
  rowwise() %>% 
  mutate(which_pos = list(nv[which(c_across(var1:var4) == "yes")]), 
         num.positive = length(which_pos), 
         is.positive = num.positive > 0) 

out
#> # A tibble: 6 × 7
#> # Rowwise: 
#>   var1  var2  var3  var4  which_pos num.positive is.positive
#>   <chr> <chr> <chr> <chr> <list>           <int> <lgl>      
#> 1 yes   <NA>  yes   N/A   <chr [2]>            2 TRUE       
#> 2 no    <NA>  yes   yes   <chr [2]>            2 TRUE       
#> 3 no    yes   yes   no    <chr [2]>            2 TRUE       
#> 4 N/A   no    no    no    <chr [0]>            0 FALSE      
#> 5 <NA>  yes   yes   yes   <chr [3]>            3 TRUE       
#> 6 <NA>  <NA>  N/A   <NA>  <chr [0]>            0 FALSE
out$which_pos
#> [[1]]
#> [1] "var1" "var3"
#> 
#> [[2]]
#> [1] "var3" "var4"
#> 
#> [[3]]
#> [1] "var2" "var3"
#> 
#> [[4]]
#> character(0)
#> 
#> [[5]]
#> [1] "var2" "var3" "var4"
#> 
#> [[6]]
#> character(0)

^{Created on 2022-05-26 by the reprex package (v2.0.1)}

If you wanted a normal column for the variable identifying which ones are positive, you could simply paste the names together to create a string that has comma-separated names:

library(tidyverse)
test.data <- data.frame(var1 = c("yes", "no", "no", "N/A", NA, NA),
                        var2 = c(NA, NA, "yes", "no", "yes", NA),
                        var3 = c("yes", "yes", "yes", "no", "yes", "N/A"),
                        var4 = c("N/A", "yes", "no", "no", "yes", NA))


nv <- test.data %>% 
  select(var1:var4) %>% 
  names()
out <- test.data %>% 
  rowwise() %>% 
  mutate(which_pos = paste(nv[which(c_across(var1:var4) == "yes")], collapse=","), 
         num.positive = sum(c_across(var1:var4) == "yes", na.rm=TRUE), 
         is.positive = num.positive > 0) 

out
#> # A tibble: 6 × 7
#> # Rowwise: 
#>   var1  var2  var3  var4  which_pos        num.positive is.positive
#>   <chr> <chr> <chr> <chr> <chr>                   <int> <lgl>      
#> 1 yes   <NA>  yes   N/A   "var1,var3"                 2 TRUE       
#> 2 no    <NA>  yes   yes   "var3,var4"                 2 TRUE       
#> 3 no    yes   yes   no    "var2,var3"                 2 TRUE       
#> 4 N/A   no    no    no    ""                          0 FALSE      
#> 5 <NA>  yes   yes   yes   "var2,var3,var4"            3 TRUE       
#> 6 <NA>  <NA>  N/A   <NA>  ""                          0 FALSE

^{Created on 2022-05-26 by the reprex package (v2.0.1)}

The list column might be easier to use in subsequent analyses if needed, but the comma-separated variable maybe easier to use for visual inspection.

回复收藏 0 原文

捂风挽笑 2025-02-08 12:02:40

使用基本R：

is.na(test.data) <- test.data == 'N/A'
idx <- test.data == 'yes'
test.data['num.positive'] <- rowSums(idx, na.rm = TRUE)
test.data['is.positive'] <- +(test.data[['num.positive']] > 0)

idx2 <- data.frame(which(idx, TRUE))
df1 <- aggregate(col~row, idx2, \(x)paste(names(test.data)[x], collapse = '-'))
df2 <- merge(cbind(test.data, row = seq(nrow(test.data))), df1, all.x =TRUE)

df2

  row var1 var2 var3 var4 num.positive is.positive            col
1   1  yes <NA>  yes <NA>            2           1      var1-var3
2   2   no <NA>  yes  yes            2           1      var3-var4
3   3   no  yes  yes   no            2           1      var2-var3
4   4 <NA>   no   no   no            0           0           <NA>
5   5 <NA>  yes  yes  yes            3           1 var2-var3-var4
6   6 <NA> <NA> <NA> <NA>            0           0           <NA>

Using Base R:

is.na(test.data) <- test.data == 'N/A'
idx <- test.data == 'yes'
test.data['num.positive'] <- rowSums(idx, na.rm = TRUE)
test.data['is.positive'] <- +(test.data[['num.positive']] > 0)

idx2 <- data.frame(which(idx, TRUE))
df1 <- aggregate(col~row, idx2, \(x)paste(names(test.data)[x], collapse = '-'))
df2 <- merge(cbind(test.data, row = seq(nrow(test.data))), df1, all.x =TRUE)

df2

  row var1 var2 var3 var4 num.positive is.positive            col
1   1  yes <NA>  yes <NA>            2           1      var1-var3
2   2   no <NA>  yes  yes            2           1      var3-var4
3   3   no  yes  yes   no            2           1      var2-var3
4   4 <NA>   no   no   no            0           0           <NA>
5   5 <NA>  yes  yes  yes            3           1 var2-var3-var4
6   6 <NA> <NA> <NA> <NA>            0           0           <NA>

回复收藏 0 原文

~没有更多了~