使用每个组的前 n 行对数据框进行子集化,并按变量排序
我想对 n 行数据帧进行子集化,这些行按一个变量分组,并按另一个变量降序排序。通过一个例子就可以清楚地看出这一点:
d1 <- data.frame(Gender = c("M", "M", "F", "F", "M", "M", "F",
"F"), Age = c(15, 38, 17, 35, 26, 24, 20, 26))
我想为每个性别获取 2 行,这些行按年龄降序排序。期望的输出是:
Gender Age
F 35
F 26
M 38
M 26
我在这里寻找顺序、排序和其他解决方案,但找不到该问题的适当解决方案。我很感激你的帮助。
I would like to subset a data frame for n rows, which are grouped by a variable and are sorted descending by another variable. This would be clear with an example:
d1 <- data.frame(Gender = c("M", "M", "F", "F", "M", "M", "F",
"F"), Age = c(15, 38, 17, 35, 26, 24, 20, 26))
I would like to get 2 rows, which are sorted descending on Age, for each Gender. The desired output is:
Gender Age
F 35
F 26
M 38
M 26
I looked for order, sort and other solutions here, but could not find an appropriate solution to this problem. I appreciate your help.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
一种使用
plyr
中的ddply()
的解决方案One solution using
ddply()
fromplyr
带有 data.table 包
With data.table package
我确信有更好的答案,但这是一种方法:
如果您的数据框比此处提供的数据框大,并且不想直观地检查要选择哪些行,则只需使用以下方法:
I'm sure there is a better answer, but here is one way:
If you have a larger data frame than the one you provided here and don't want to inspect visually which rows to select, just use this:
如果您只想进行排序,则比这更容易:
然后您可以调用:
对每个性别子组的前两个进行子集化。
It is even easier than that if you just want to do the sorting:
you can then call:
to subset the top two of each Gender subgroup.
例如,如果您需要前 2 个女性和前 3 个男性,我有一个建议:
您只需更改最终数据框的名称即可。
I have a suggestion if you need, for example, the first 2 females and the first 3 males:
You just need to change the names of the final dataframe.
遇到了类似的问题,发现在具有 150 万条记录的 data.frame 上使用此方法非常快
Had a similar problem and found this method really fast when used on a data.frame with 1.5 million records