R-将重复项提取到数据框
我需要r的帮助,类似于问题 filtering-a-a-a-dataframe-showing-showning-show-show-show-show-show-sonly-deplicates 我希望从具有超过2,000个条目的数据框架中提取重复。
前15行数据看起来像这样:
运行 | ID | 差异 |
---|---|---|
1 | 20 | 0 |
1 | 4 | 1024 |
1 | 4 | 1 |
1 | 4 | 1 4 |
4 | 4 | 65 |
1 | 1 4 | 1 |
1 | 1 4 | 1 |
1 | 11 | 475 |
1 | 11 1 | 11 1 |
1 | 11 1 1 11 | 1 |
2 | 25 | 0 |
2 | 18 0 2 18 | 0 |
2 | 18 | 1 |
2 | 18 | 1 |
2 | 18 | 1 |
我只想提取重复项,即
运行 | ID | diff |
---|---|---|
1 | 4 | 1024 |
1 | 4 | 1 |
1 | 4 | 1 4 |
1 | 4 | 65 |
1 | 4 1 4 | 1 |
4 | 1 | 1 |
1 | 11 | 475 |
1 | 11 | 1 |
11 475 1 11 1 1 1 | 11 | 1 |
2 | 18 | 0 |
2 | 18 | 1 |
2 | 18 | 1 |
2 | 18 | 1 |
使用命令
mydata_extract%>%group_by(id)%>%filter(n(n()> 1)
不提取数据,实际上我会返回完整的数据集。我需要更改的“滤镜(n()> 1)”有什么东西吗?我是R的初学者。 抱歉,我的数据表未正确格式化,预览看起来还可以!
我还将首先通过“运行”对数据进行分组
I need help with R, similar to question filtering-a-dataframe-showing-only-duplicates I wish to extract duplicates from a dataframe with over 2,000 entries.
The first 15 rows of data looks like this:
run | id | Diff |
---|---|---|
1 | 20 | 0 |
1 | 4 | 1024 |
1 | 4 | 1 |
1 | 4 | 1 |
1 | 4 | 65 |
1 | 4 | 1 |
1 | 4 | 1 |
1 | 11 | 475 |
1 | 11 | 1 |
1 | 11 | 1 |
2 | 25 | 0 |
2 | 18 | 0 |
2 | 18 | 1 |
2 | 18 | 1 |
2 | 18 | 1 |
I wish to extract only the duplicates, i.e.
run | id | Diff |
---|---|---|
1 | 4 | 1024 |
1 | 4 | 1 |
1 | 4 | 1 |
1 | 4 | 65 |
1 | 4 | 1 |
1 | 4 | 1 |
1 | 11 | 475 |
1 | 11 | 1 |
1 | 11 | 1 |
2 | 18 | 0 |
2 | 18 | 1 |
2 | 18 | 1 |
2 | 18 | 1 |
Using the command
mydata_extract %>% group_by(id) %>% filter(n() > 1)
does not extract the data, in fact I get the complete set of data returned. Is there something about "filter(n() > 1)" that I need to change? I'm a beginner with R.
Sorry my data table is not formatting correctly, it looks okay in preview!
I will also want to group my data first by "run"
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
也许在
group_by()
中添加运行和ID?您可以添加一个突变,以查看此
n()
的工作方式(计数每个组的行数),例如Maybe add run and id in the
group_by()
?You can add a mutate, to see how this
n()
works (counts the number of rows per group),e.g.