总结（）：捕获应该是唯一的非唯一值

发布于 2025-01-25 13:29:23 字数 1178 浏览 5 评论 0原文

使用总结（）和group_by，我想捕获应该是唯一的非唯一值。

示例：

dd <- data.frame(person=rep(1:3, each=2), 
  place=rep(c("earth", "moon"), times=3), 
  height=rep(c(60, 70, 80), each=2), 
  weight=c(10, 60, 12, 72, 15, 90))
dd

  person place height weight
1      1 earth     60     10
2      1  moon     60     60
3      2 earth     70     12
4      2  moon     70     72
5      3 earth     80     15
6      3  moon     80     90

这可以正常工作：

byPerson <- summarise(.data=group_by(.data=dd, person), 
  height=unique(height))
byPerson

  person height
1      1     60
2      2     70
3      3     80

但是：

byPerson2 <- summarise(.data=group_by(.data=dd, person), 
  height=unique(height), weight=unique(weight))

dplyr的早期版本曾经失败，因为权重在Person中不是唯一的。当前版本给出了此结果，

  person height weight
1      1     60     10
2      1     60     60
3      2     70     12
4      2     70     72
5      3     80     15
6      3     80     90

而不是通知用户错误。

是否有一些重新获得早期行为的方法？最好通过设置标志或其他东西，因为我可以检查代码中的非唯一值，但这更痛苦。

原文

Using summarise() and group_by, I want to catch non-unique values that should be unique.

Example:

dd <- data.frame(person=rep(1:3, each=2), 
  place=rep(c("earth", "moon"), times=3), 
  height=rep(c(60, 70, 80), each=2), 
  weight=c(10, 60, 12, 72, 15, 90))
dd

  person place height weight
1      1 earth     60     10
2      1  moon     60     60
3      2 earth     70     12
4      2  moon     70     72
5      3 earth     80     15
6      3  moon     80     90

This works just fine:

byPerson <- summarise(.data=group_by(.data=dd, person), 
  height=unique(height))
byPerson

  person height
1      1     60
2      2     70
3      3     80

But:

byPerson2 <- summarise(.data=group_by(.data=dd, person), 
  height=unique(height), weight=unique(weight))

Earlier versions of dplyr used to fail, because weight is not unique within person. The current version gives this result,

  person height weight
1      1     60     10
2      1     60     60
3      2     70     12
4      2     70     72
5      3     80     15
6      3     80     90

instead of notifying the user of the error.

Is there some way of regaining the earlier behavior? Preferably by setting a flag or something, since I could check for non-unique values in code but that's more of a pain.

分享到QQ

分享到微博