根据相同数据框的逻辑获取列名的向量

发布于 2025-02-13 06:34:50 字数 630 浏览 3 评论 0原文

我有一个命名的dataframe containig逻辑逻辑,我想获得一个带有列名的向量,其中值为true(向下行,如果多个true> true在一排,从左到右)。这里一个示例:

df <- data.frame(a= c(FALSE, NA, TRUE, TRUE),
                 b= c(TRUE, FALSE, FALSE, NA),
                 c= c(TRUE, TRUE, NA, NA))
df
#       a     b    c
# 1 FALSE  TRUE TRUE
# 2    NA FALSE TRUE
# 3  TRUE FALSE   NA
# 4  TRUE    NA   NA

expected <- c("b", "c", "c", "a", "a")

从第一行转到最后一行,在第一行中看到true。这是多个true s,因此我们从左到右转,然后获得“ b”“ c”。在第二个拖车中,我们得到“ C”,依此类推。

如何(以一种优雅的方式)做到这一点?

I have a named dataframe containig logicals with missings and I want to get a vector with the column names where values are TRUE (going down the rows and, if multiple TRUEs in one row, going from left to right). Here an example:

df <- data.frame(a= c(FALSE, NA, TRUE, TRUE),
                 b= c(TRUE, FALSE, FALSE, NA),
                 c= c(TRUE, TRUE, NA, NA))
df
#       a     b    c
# 1 FALSE  TRUE TRUE
# 2    NA FALSE TRUE
# 3  TRUE FALSE   NA
# 4  TRUE    NA   NA

expected <- c("b", "c", "c", "a", "a")

Going from first to last row we see TRUE in the first row. Here are multiple TRUEs, thus we go from left to right and get "b" and "c". In second tow we get "c", and so on.

How to do this (in an elegant way)?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

相权↑美人 2025-02-20 06:34:50

您可以在基本 r:

pos <- which(t(df) == TRUE, arr.ind = TRUE)
names(df)[pos[, "row"]]
[1] "b" "c" "c" "a" "a"

You can do in base R:

pos <- which(t(df) == TRUE, arr.ind = TRUE)
names(df)[pos[, "row"]]
[1] "b" "c" "c" "a" "a"
尘世孤行 2025-02-20 06:34:50

您也可以尝试使用应用

unlist(apply(df, 1, function(x){na.omit(names(df)[x])}))

[1] "b" "c" "c" "a" "a"

You can also try using apply

unlist(apply(df, 1, function(x){na.omit(names(df)[x])}))

[1] "b" "c" "c" "a" "a"
彼岸花ソ最美的依靠 2025-02-20 06:34:50

continue

Here is a tidyverse way:

library(dplyr)
library(tidyr)

vector <- df %>% 
  mutate(across(, ~case_when(.==TRUE ~ cur_column()), .names = 'new_{col}')) %>%
  unite(New_Col, starts_with('new'), na.rm = TRUE, sep = ', ') %>% 
  separate_rows(New_Col) %>% 
  pull(New_Col)

Or:

library(dplyr)
library(tidyr)

df %>% 
  mutate(across(, ~case_when(.==TRUE ~ cur_column()))) %>% 
  pivot_longer(everything()) %>% 
  na.omit() %>% 
  pull(value)
[1] "b" "c" "c" "a" "a"
素染倾城色 2025-02-20 06:34:50

基于purrr :: pmap的另一个可能的解决方案:

library(tidyverse)

pmap(df, ~ names(df)[c(...)] %>% na.omit) %>% unlist

#> [1] "b" "c" "c" "a" "a"

Another possible solution, based on purrr::pmap:

library(tidyverse)

pmap(df, ~ names(df)[c(...)] %>% na.omit) %>% unlist

#> [1] "b" "c" "c" "a" "a"
ぃ弥猫深巷。 2025-02-20 06:34:50

您可以使用%%(modulo)来标识列索引。

names(df)[(which(t(df)) - 1) %% ncol(df) + 1]

# [1] "b" "c" "c" "a" "a"

基准
df <- as.data.frame(matrix(sample(c(TRUE, FALSE, NA), 1e7, TRUE), 1e5, 1e2))

# A data.frame: 100,000 × 100
#     V1    V2    V3    V4    V5 ...
# 1 TRUE  TRUE    NA FALSE FALSE ...
# 2   NA  TRUE  TRUE  TRUE    NA ...
# 3   NA FALSE FALSE FALSE  TRUE ...
# 4   NA FALSE FALSE  TRUE FALSE ...
# 5   NA FALSE FALSE FALSE  TRUE ...

library(microbenchmark)

bm <- microbenchmark(
  Darren = {
    x1 <- names(df)[(which(t(df)) - 1) %% ncol(df) + 1]
  }, Clemsang = {
    x2 <- names(df)[which(t(df) == TRUE, arr.ind = TRUE)[, "row"]]
  })

all(x1 == x2)
# [1] TRUE

bm
# Unit: milliseconds
#      expr      min       lq     mean   median       uq      max neval
#    Darren 140.5595 153.3333 163.7934 159.4783 167.5418 284.4146   100
#  Clemsang 219.7802 242.6169 254.9226 250.8673 264.0462 356.9299   100

You can use %%(modulo) to identify the column indices.

names(df)[(which(t(df)) - 1) %% ncol(df) + 1]

# [1] "b" "c" "c" "a" "a"

Benchmark
df <- as.data.frame(matrix(sample(c(TRUE, FALSE, NA), 1e7, TRUE), 1e5, 1e2))

# A data.frame: 100,000 × 100
#     V1    V2    V3    V4    V5 ...
# 1 TRUE  TRUE    NA FALSE FALSE ...
# 2   NA  TRUE  TRUE  TRUE    NA ...
# 3   NA FALSE FALSE FALSE  TRUE ...
# 4   NA FALSE FALSE  TRUE FALSE ...
# 5   NA FALSE FALSE FALSE  TRUE ...

library(microbenchmark)

bm <- microbenchmark(
  Darren = {
    x1 <- names(df)[(which(t(df)) - 1) %% ncol(df) + 1]
  }, Clemsang = {
    x2 <- names(df)[which(t(df) == TRUE, arr.ind = TRUE)[, "row"]]
  })

all(x1 == x2)
# [1] TRUE

bm
# Unit: milliseconds
#      expr      min       lq     mean   median       uq      max neval
#    Darren 140.5595 153.3333 163.7934 159.4783 167.5418 284.4146   100
#  Clemsang 219.7802 242.6169 254.9226 250.8673 264.0462 356.9299   100
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文