比较多列与平等

发布于 2025-01-27 16:08:24 字数 1761 浏览 2 评论 0 原文

这可能是一个非常基本的问题,使用 dplyr tidyverse工具,但我无法 找到一个很好的方法。

假设我具有广泛格式的数据框架,我想选择行,以便列的子集具有所有相同的值。天真的,我可以做以下操作:

> df <- tribble(
    ~name, ~id,  ~cost, ~value1 , ~value2, ~value3,
    "a",     1,     10,       1,        1,       1,
    "a",     2,     20,       1,        2,       1,
    "b",     3,     50,       1,        1,       3,
    "b",     4,     45,       1,        1,       1,
    "b",     5,     70,       2,        2,       2
)


> df %>% select(
    value1 == value2 &
    value1 == value3 &
    value2 == value
)

# A tibble: 3 × 6
  name     id  cost value1 value2 value3
  <chr> <dbl> <dbl>  <dbl>  <dbl>  <dbl>
1 a         1    10      1      1      1
2 b         4    45      1      1      1
3 b         5    70      2      2      2

现在,假设要比较的列数非常大(&gt; 10)。所有列均以 value 开头,因此我们可以具有 value_something,value_otherthing,value_morthing ,即,不一定像本示例一样。但是,如果列的数量为 n ,我必须天真地创建 n *(n -1)/2 比较,这显然是无法管理的。

是否有类似

df %>% filter(all_same(starts_with("value")))

all_same()比较所有选定的列, start_with() (或任何其他选择器)?

rowwise> rowwise() 也没有帮助我太多。

This may be a very elementary question using dplyr and tidyverse tools, but I couldn't
find a good way to do it.

Let's suppose I have a data frame in a wide format, and I want to select rows so that a subset of columns has all the same value. Naively, I can do the following:

> df <- tribble(
    ~name, ~id,  ~cost, ~value1 , ~value2, ~value3,
    "a",     1,     10,       1,        1,       1,
    "a",     2,     20,       1,        2,       1,
    "b",     3,     50,       1,        1,       3,
    "b",     4,     45,       1,        1,       1,
    "b",     5,     70,       2,        2,       2
)


> df %>% select(
    value1 == value2 &
    value1 == value3 &
    value2 == value
)

# A tibble: 3 × 6
  name     id  cost value1 value2 value3
  <chr> <dbl> <dbl>  <dbl>  <dbl>  <dbl>
1 a         1    10      1      1      1
2 b         4    45      1      1      1
3 b         5    70      2      2      2

Now, let's suppose the number of the columns to be compared is very large (> 10). All columns start with value, so that we may have value_something, value_otherthing, value_morething, i.e., not necessarily numeric as in this example. However, if the number of columns is n, naively I have to create n * (n - 1) / 2 comparisons, which is clearly unmanageable.

Is there something like

df %>% filter(all_same(starts_with("value")))

where all_same()compares all selected columns by starts_with() (or any other selector)?

rowwise() and
across()
didn't help me too much either.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

琉璃繁缕 2025-02-03 16:08:24

我们可以使用 if_all 将列从'value2'到'value3'上循环,请检查列值是否与 value1 if_all 相等。仅对于所有列比较为true

library(dplyr)
df %>%
    filter(if_all(value2:value3, ~ value1 == .x))

-oftup

# A tibble: 3 × 6
  name     id  cost value1 value2 value3
  <chr> <dbl> <dbl>  <dbl>  <dbl>  <dbl>
1 a         1    10      1      1      1
2 b         4    45      1      1      1
3 b         5    70      2      2      2

或我们要使用 start_with 的行,返回true。

df %>%
    filter(if_all(starts_with('value'), ~ value1 == .x))

We may use if_all to loop over the columns from 'value2' to 'value3', check if the column values are equal with value1, if_all returns TRUE only for a row where all the column comparisons are TRUE

library(dplyr)
df %>%
    filter(if_all(value2:value3, ~ value1 == .x))

-output

# A tibble: 3 × 6
  name     id  cost value1 value2 value3
  <chr> <dbl> <dbl>  <dbl>  <dbl>  <dbl>
1 a         1    10      1      1      1
2 b         4    45      1      1      1
3 b         5    70      2      2      2

Or if we want to use starts_with

df %>%
    filter(if_all(starts_with('value'), ~ value1 == .x))
纸伞微斜 2025-02-03 16:08:24

这是一个可能的基本R选项,我们可以计算唯一值的数量,以查看每行是否只有1个(仅针对“值”列)。

df[apply(df[, -c(1:3)], 1, function(x) length(unique(x)) == 1), ]

或另一个选项是使用 startswith 选择以“值”(而不是索引)开头的列。

df[apply(df[, startsWith(names(df), "value")], 1, function(x)
  length(unique(x)) == 1),]

输出

  name     id  cost value1 value2 value3
  <chr> <dbl> <dbl>  <dbl>  <dbl>  <dbl>
1 a         1    10      1      1      1
2 b         4    45      1      1      1
3 b         5    70      2      2      2

Here's a possible base R option, where we can count the number of unique values to see if there is only 1 for each row (and just for the "value" columns).

df[apply(df[, -c(1:3)], 1, function(x) length(unique(x)) == 1), ]

Or another option is to use startsWith to select the columns that start with "value" (instead of indices).

df[apply(df[, startsWith(names(df), "value")], 1, function(x)
  length(unique(x)) == 1),]

Output

  name     id  cost value1 value2 value3
  <chr> <dbl> <dbl>  <dbl>  <dbl>  <dbl>
1 a         1    10      1      1      1
2 b         4    45      1      1      1
3 b         5    70      2      2      2
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文