选择 r 中组内变量值最大的行

发布于 2024-09-01 11:56:22 字数 302 浏览 6 评论 0原文

a.2<-sample(1:10,100,replace=T)
b.2<-sample(1:100,100,replace=T)
a.3<-data.frame(a.2,b.2)

r<-sapply(split(a.3,a.2),function(x) which.max(x$b.2))

a.3[r,]

返回列表索引，而不是整个 data.frame 的索引

我试图为 a.2 的每个子组返回 b.2 的最大值。我怎样才能有效地做到这一点？

原文

a.2<-sample(1:10,100,replace=T)
b.2<-sample(1:100,100,replace=T)
a.3<-data.frame(a.2,b.2)

r<-sapply(split(a.3,a.2),function(x) which.max(x$b.2))

a.3[r,]

returns the list index, not the index for the entire data.frame

Im trying to return the largest value of b.2 for each subgroup of a.2. How can I do this efficiently?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

小耗子 2024-09-08 11:56:22

我认为 ddply 和 ave 方法都相当耗费资源。 ave 因当前问题的内存不足而失败（67,608 行，其中四列定义唯一键）。 tapply 是一个方便的选择，但我通常需要做的是选择每个唯一键（通常由多个列定义）具有某个值的所有整行。我发现的最佳解决方案是进行排序，然后使用重复的否定来仅选择每个唯一键的第一行。对于这里的简单示例：

a <- sample(1:10,100,replace=T)
b <- sample(1:100,100,replace=T)
f <- data.frame(a, b)

sorted <- f[order(f$a, -f$b),]
highs <- sorted[!duplicated(sorted$a),]

我认为至少与 ave 或 ddply 相比，性能提升是相当可观的。对于多列键来说稍微复杂一些，但是 order 将处理一大堆要排序的事情，并且 duplicated 适用于数据框，因此可以继续使用这种方法。

The ddply and ave approaches are both fairly resource-intensive, I think. ave fails by running out of memory for my current problem (67,608 rows, with four columns defining the unique keys). tapply is a handy choice, but what I generally need to do is select all the whole rows with the something-est some-value for each unique key (usually defined by more than one column). The best solution I've found is to do a sort and then use negation of duplicated to select only the first row for each unique key. For the simple example here:

a <- sample(1:10,100,replace=T)
b <- sample(1:100,100,replace=T)
f <- data.frame(a, b)

sorted <- f[order(f$a, -f$b),]
highs <- sorted[!duplicated(sorted$a),]

I think the performance gains over ave or ddply, at least, are substantial. It is slightly more complicated for multi-column keys, but order will handle a whole bunch of things to sort on and duplicated works on data frames, so it's possible to continue using this approach.

回复收藏 0 原文

何其悲哀 2024-09-08 11:56:22

library(plyr)
ddply(a.3, "a.2", subset, b.2 == max(b.2))

library(plyr)
ddply(a.3, "a.2", subset, b.2 == max(b.2))

回复收藏 0 原文

唔猫 2024-09-08 11:56:22

a.2<-sample(1:10,100,replace=T)
b.2<-sample(1:100,100,replace=T)
a.3<-data.frame(a.2,b.2)

乔纳森·张（Jonathan Chang）的答案让您得到了您明确要求的内容，但我猜测您想要数据框中的实际行。

sel <- ave(b.2, a.2, FUN = max) == b.2
a.3[sel,]

a.2<-sample(1:10,100,replace=T)
b.2<-sample(1:100,100,replace=T)
a.3<-data.frame(a.2,b.2)

The answer by Jonathan Chang gets you what you explicitly asked for, but I'm guessing that you want the actual row from the data frame.

sel <- ave(b.2, a.2, FUN = max) == b.2
a.3[sel,]

回复收藏 0 原文

遇到 2024-09-08 11:56:22

a.2<-sample(1:10,100,replace=T)
b.2<-sample(1:100,100,replace=T)
a.3<-data.frame(a.2,b.2)
m<-split(a.3,a.2)
u<-function(x){
    a<-rownames(x)
    b<-which.max(x[,2])
    as.numeric(a[b])
    }
r<-sapply(m,FUN=function(x) u(x))

a.3[r,]

这确实有效，尽管有点麻烦......但它允许我抓取分组最大值的行。还有其他想法吗？

a.2<-sample(1:10,100,replace=T)
b.2<-sample(1:100,100,replace=T)
a.3<-data.frame(a.2,b.2)
m<-split(a.3,a.2)
u<-function(x){
    a<-rownames(x)
    b<-which.max(x[,2])
    as.numeric(a[b])
    }
r<-sapply(m,FUN=function(x) u(x))

a.3[r,]

This does the trick, albeit somewhat cumbersome...But it allows me to grab the rows for the groupwise largest values. Any other ideas?

回复收藏 0 原文

爺獨霸怡葒院 2024-09-08 11:56:22

> a.2<-sample(1:10,100,replace=T)
> b.2<-sample(1:100,100,replace=T)
> tapply(b.2, a.2, max)
 1  2  3  4  5  6  7  8  9 10 
99 92 96 97 98 99 94 98 98 96

> a.2<-sample(1:10,100,replace=T)
> b.2<-sample(1:100,100,replace=T)
> tapply(b.2, a.2, max)
 1  2  3  4  5  6  7  8  9 10 
99 92 96 97 98 99 94 98 98 96

回复收藏 0 原文

新人笑 2024-09-08 11:56:22

a.2<-sample(1:10,100,replace=T)
b.2<-sample(1:100,100,replace=T)
a.3<-data.frame(a.2,b.2)

使用aggregate，您可以在一行中获取每个组的最大值：

aggregate(a.3, by = list(a.3$a.2), FUN = max)

这会产生以下输出：

   Group.1 a.2 b.2
1        1   1  96
2        2   2  82
...
8        8   8  85
9        9   9  93
10      10  10  97

a.2<-sample(1:10,100,replace=T)
b.2<-sample(1:100,100,replace=T)
a.3<-data.frame(a.2,b.2)

With aggregate, you can get the maximum for each group in one line:

aggregate(a.3, by = list(a.3$a.2), FUN = max)

This produces the following output:

   Group.1 a.2 b.2
1        1   1  96
2        2   2  82
...
8        8   8  85
9        9   9  93
10      10  10  97

回复收藏 0 原文

~没有更多了~

关于作者

A君

暂无简介

0 文章

0 评论

24 人气

关注发私信

友情链接

文江博客

选择 r 中组内变量值最大的行

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（6）

关于作者

相关话题

热门标签

推荐作者

苍风燃霜

我的黑色迷你裙

悸初

撧情箌佬

森罗

lyn1245

友情链接

选择 r 中组内变量值最大的行

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（6）

关于作者

相关话题

热门标签

推荐作者

苍风燃霜

我的黑色迷你裙

悸初

撧情箌佬

森罗

lyn1245

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。