unique() 用于多个变量
我在 R 中有以下数据框:
> str(df)
'data.frame': 545227 obs. of 15 variables:
$ ykod : int 93 93 93 93 93 93 93 93 93 93 ...
$ yad : Factor w/ 42 levels "BAKUGAN","BARBIE",..: 30 30 30 30 30 30 30 30 30 30 ...
$ per : Factor w/ 3 levels "2 AYLIK","3 AYLIK",..: 3 3 3 3 3 3 3 3 3 3 ...
$ donem: int 201101 201101 201101 201101 201101 201101 201101 201101 201101 201101 ...
$ sayi : int 201101 201101 201101 201101 201101 201101 201101 201101 201101 201101 ...
$ mkod : int 4 5 9 11 12 18 20 22 25 26 ...
$ mad : Factor w/ 10464 levels " Defne Market ",..: 405 8075 9710 10145 9297 7973 2542 3892 2759 5769 ...
$ mtip : Factor w/ 29 levels "Abone Bürosu ",..: 2 20 20 2 2 2 2 2 2 2 ...
$ kanal: Factor w/ 2 levels "OB","SS": 2 2 2 2 2 2 2 2 2 2 ...
$ bkod : int 110565 110565 110565 110565 110565 110565 110565 110565 110565 110565 ...
$ bad : Factor w/ 212 levels "4. Levent","500 Evler",..: 167 167 167 167 167 167 167 167 167 167 ...
$ bolge: Factor w/ 12 levels "Adana Şehiriçi",..: 7 7 7 7 7 7 7 7 7 7 ...
$ sevk : int 2 3 3 3 2 2 2 6 2 2 ...
$ iade : int 2 1 0 2 0 2 1 0 0 2 ...
$ satis: int 0 2 3 1 2 0 1 6 2 0 ...
我想列出所选多个变量的唯一(如 SQL 的 DISTINCT)值。例如,unique(yad)
给出了每 42 个元素的名称,但我需要提取两列(yad
和 per
在一起,具有所有独特的组合):
yad per
--- ---
BARBIE AYLIK
BAKUGAN 2 AYLIK
MICKEY MOUSE 2 AYLIK
TINKERBELL 3 AYLIK
... ...
我怎样才能实现这一目标?
I have the following data frame in R:
> str(df)
'data.frame': 545227 obs. of 15 variables:
$ ykod : int 93 93 93 93 93 93 93 93 93 93 ...
$ yad : Factor w/ 42 levels "BAKUGAN","BARBIE",..: 30 30 30 30 30 30 30 30 30 30 ...
$ per : Factor w/ 3 levels "2 AYLIK","3 AYLIK",..: 3 3 3 3 3 3 3 3 3 3 ...
$ donem: int 201101 201101 201101 201101 201101 201101 201101 201101 201101 201101 ...
$ sayi : int 201101 201101 201101 201101 201101 201101 201101 201101 201101 201101 ...
$ mkod : int 4 5 9 11 12 18 20 22 25 26 ...
$ mad : Factor w/ 10464 levels " Defne Market ",..: 405 8075 9710 10145 9297 7973 2542 3892 2759 5769 ...
$ mtip : Factor w/ 29 levels "Abone Bürosu ",..: 2 20 20 2 2 2 2 2 2 2 ...
$ kanal: Factor w/ 2 levels "OB","SS": 2 2 2 2 2 2 2 2 2 2 ...
$ bkod : int 110565 110565 110565 110565 110565 110565 110565 110565 110565 110565 ...
$ bad : Factor w/ 212 levels "4. Levent","500 Evler",..: 167 167 167 167 167 167 167 167 167 167 ...
$ bolge: Factor w/ 12 levels "Adana Şehiriçi",..: 7 7 7 7 7 7 7 7 7 7 ...
$ sevk : int 2 3 3 3 2 2 2 6 2 2 ...
$ iade : int 2 1 0 2 0 2 1 0 0 2 ...
$ satis: int 0 2 3 1 2 0 1 6 2 0 ...
I want to list unique (like SQL's DISTINCT) values for selected multiple variables. For example, unique(yad)
gives me the names of each 42 elements, but I need to extract two columns (yad
and per
together, with all unique combinations):
yad per
--- ---
BARBIE AYLIK
BAKUGAN 2 AYLIK
MICKEY MOUSE 2 AYLIK
TINKERBELL 3 AYLIK
... ...
How can I achieve this?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
使用
unique()
本身怎么样?How about using
unique()
itself?基于任何列都是唯一的,并使用 dplyr 保留所有其他列。
unique based on any columns and keep all other columns using
dplyr
.这是对乔希答案的补充。
您还可以保留其他变量的值,同时过滤掉 data.table 中的重复
行示例:
警报:如果其他变量中的值有不同的组合,那么您的结果将是
V1 和 V2 的唯一组合
This is an addition to Josh's answer.
You can also keep the values of other variables while filtering out duplicated rows in data.table
Example:
Alert: If there are different combinations of values in the other variables, then your result will be
unique combination of V1 and V2
有几种方法可以获得一组因素的所有独特组合。
There are a few ways to get all unique combinations of a set of factors.
这个 dplyr 方法在管道传输时效果很好。
对于选定的列:
或对于整个数据框:
This
dplyr
method works nicely when piping.For selected columns:
Or for the whole dataframe:
这是一个老问题,有很多解决方案。
然而,如果您需要基于所选列的独特观察,同时保留数据框中的所有其他列,您可以使用基本 R 以干净的方式完成此操作,如下所示:
另一种方法是使用 @micahkimel 提议的“不同”
This is an old question with many solutions.
Yet, in case you need unique observations based on a selection of columns while also keeping all other columns in the dataframe, you can do it in a clean way using base R as follows:
The alternative is using 'distinct' as proposed by @micahkimel