当前位置：文江博客话题详情

比较多列与平等

发布于 2025-01-27 16:08:24 字数 1761 浏览 2 评论 0 原文

这可能是一个非常基本的问题，使用 dplyr 和 tidyverse工具，但我无法找到一个很好的方法。

假设我具有广泛格式的数据框架，我想选择行，以便列的子集具有所有相同的值。天真的，我可以做以下操作：

> df <- tribble(
    ~name, ~id,  ~cost, ~value1 , ~value2, ~value3,
    "a",     1,     10,       1,        1,       1,
    "a",     2,     20,       1,        2,       1,
    "b",     3,     50,       1,        1,       3,
    "b",     4,     45,       1,        1,       1,
    "b",     5,     70,       2,        2,       2
)


> df %>% select(
    value1 == value2 &
    value1 == value3 &
    value2 == value
)

# A tibble: 3 × 6
  name     id  cost value1 value2 value3
  <chr> <dbl> <dbl>  <dbl>  <dbl>  <dbl>
1 a         1    10      1      1      1
2 b         4    45      1      1      1
3 b         5    70      2      2      2

现在，假设要比较的列数非常大（＆gt; 10）。所有列均以 value 开头，因此我们可以具有 value_something，value_otherthing，value_morthing ，即，不一定像本示例一样。但是，如果列的数量为 n ，我必须天真地创建 n *（n -1）/2 比较，这显然是无法管理的。

是否有类似

df %>% filter(all_same(starts_with("value")))

all_same（）比较所有选定的列， start_with（）（或任何其他选择器）？

rowwise> rowwise（） 和也没有帮助我太多。

原文

This may be a very elementary question using dplyr and tidyverse tools, but I couldn't
find a good way to do it.

Let's suppose I have a data frame in a wide format, and I want to select rows so that a subset of columns has all the same value. Naively, I can do the following:

> df <- tribble(
    ~name, ~id,  ~cost, ~value1 , ~value2, ~value3,
    "a",     1,     10,       1,        1,       1,
    "a",     2,     20,       1,        2,       1,
    "b",     3,     50,       1,        1,       3,
    "b",     4,     45,       1,        1,       1,
    "b",     5,     70,       2,        2,       2
)


> df %>% select(
    value1 == value2 &
    value1 == value3 &
    value2 == value
)

# A tibble: 3 × 6
  name     id  cost value1 value2 value3
  <chr> <dbl> <dbl>  <dbl>  <dbl>  <dbl>
1 a         1    10      1      1      1
2 b         4    45      1      1      1
3 b         5    70      2      2      2

Now, let's suppose the number of the columns to be compared is very large (> 10). All columns start with value, so that we may have value_something, value_otherthing, value_morething, i.e., not necessarily numeric as in this example. However, if the number of columns is n, naively I have to create n * (n - 1) / 2 comparisons, which is clearly unmanageable.

Is there something like

df %>% filter(all_same(starts_with("value")))

where all_same()compares all selected columns by starts_with() (or any other selector)?

rowwise() and
across()
didn't help me too much either.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

琉璃繁缕 2025-02-03 16:08:24

我们可以使用 if_all 将列从'value2'到'value3'上循环，请检查列值是否与 value1 ， if_all 相等。仅对于所有列比较为true

library(dplyr)
df %>%
    filter(if_all(value2:value3, ~ value1 == .x))

-oftup

# A tibble: 3 × 6
  name     id  cost value1 value2 value3
  <chr> <dbl> <dbl>  <dbl>  <dbl>  <dbl>
1 a         1    10      1      1      1
2 b         4    45      1      1      1
3 b         5    70      2      2      2

或我们要使用 start_with 的行，返回true。

df %>%
    filter(if_all(starts_with('value'), ~ value1 == .x))

We may use if_all to loop over the columns from 'value2' to 'value3', check if the column values are equal with value1, if_all returns TRUE only for a row where all the column comparisons are TRUE

library(dplyr)
df %>%
    filter(if_all(value2:value3, ~ value1 == .x))

-output

# A tibble: 3 × 6
  name     id  cost value1 value2 value3
  <chr> <dbl> <dbl>  <dbl>  <dbl>  <dbl>
1 a         1    10      1      1      1
2 b         4    45      1      1      1
3 b         5    70      2      2      2

Or if we want to use starts_with

df %>%
    filter(if_all(starts_with('value'), ~ value1 == .x))

回复收藏 0 原文

纸伞微斜 2025-02-03 16:08:24

这是一个可能的基本R选项，我们可以计算唯一值的数量，以查看每行是否只有1个（仅针对“值”列）。

df[apply(df[, -c(1:3)], 1, function(x) length(unique(x)) == 1), ]

或另一个选项是使用 startswith 选择以“值”（而不是索引）开头的列。

df[apply(df[, startsWith(names(df), "value")], 1, function(x)
  length(unique(x)) == 1),]

输出

  name     id  cost value1 value2 value3
  <chr> <dbl> <dbl>  <dbl>  <dbl>  <dbl>
1 a         1    10      1      1      1
2 b         4    45      1      1      1
3 b         5    70      2      2      2

Here's a possible base R option, where we can count the number of unique values to see if there is only 1 for each row (and just for the "value" columns).

df[apply(df[, -c(1:3)], 1, function(x) length(unique(x)) == 1), ]

Or another option is to use startsWith to select the columns that start with "value" (instead of indices).

df[apply(df[, startsWith(names(df), "value")], 1, function(x)
  length(unique(x)) == 1),]

Output

  name     id  cost value1 value2 value3
  <chr> <dbl> <dbl>  <dbl>  <dbl>  <dbl>
1 a         1    10      1      1      1
2 b         4    45      1      1      1
3 b         5    70      2      2      2

回复收藏 0 原文

~没有更多了~

关于作者

半枫

暂无简介

文章

28 人气

关注发私信

友情链接

文江博客

比较多列与平等

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

达拉崩吧

PANGOO

kkgtx

WordPress小学生

酷炫老祖宗

硪扪都還晓

友情链接

比较多列与平等

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

达拉崩吧

PANGOO

kkgtx

WordPress小学生

酷炫老祖宗

硪扪都還晓

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。