通过使用Tidyverse/其他R命令分组变量,查找最长的数据条目行
我不确定我是否用正确的标题来描述我的问题,但是我想
在使用group_by()
之后,我想最长的每个组数据条目 。当前行顺序。换句话说,组内有一个(或多个)不连续性(例如Archep> Archep()
由其他一些列)。我想获得一个新的列(例如mutate()
),该列标记每个组最长范围内的行。以下是一个示例:
data.frame(group = c(1, 1, 1, 2, 2, 3, 3, 3, 3, 3, 1, 1, 3, 1, 2, 2, 2),
order = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17))
其中,我想获得以下数据框架:
data.frame(group = c(1, 1, 1, 2, 2, 3, 3, 3, 3, 3, 1, 1, 3, 1, 2, 2, 2),
order = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17),
longest = c(T, T, T, F, F, T, T, T, T, T, F, F, F, F, T, T, T))
I am not sure if I describe my question with the correct title but the idea is:
I would like to longest stretch of rows of data entries of each group after using group_by()
which is also sensitive to the current order of rows. In other words, there are a (or multiple) discontinuities within a group (e.g. after arrange()
by some other columns). I would like to get a new column (e.g. mutate()
) that labels the rows that are within the longest stretch of each group. below is an example:
data.frame(group = c(1, 1, 1, 2, 2, 3, 3, 3, 3, 3, 1, 1, 3, 1, 2, 2, 2),
order = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17))
In which, I would like to get a data frame like the following:
data.frame(group = c(1, 1, 1, 2, 2, 3, 3, 3, 3, 3, 1, 1, 3, 1, 2, 2, 2),
order = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17),
longest = c(T, T, T, F, F, T, T, T, T, T, F, F, F, F, T, T, T))
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
在基础R中:
另一个基本R:
在Data.Table中:
in Base R:
Another Base R:
In data.table:
我们可以在
group
列中为连续值创建一个组。然后,获取这些组的行数,然后我们可以通过group
进行分组,然后返回 true ,对于每个组的连续行数量最多的行。输出
如果您在连续行之间有平局,并且只想将第一个分组返回为
t
,那么您可以做类似的事情:output
数据
We could create a group for the consecutive values in the
group
column. Then, get the number of rows for those groups, then we can group bygroup
and returnTRUE
for the rows that have the greatest number of consecutive rows for each group.Output
If you have a tie among consecutive rows and only want to return the first grouping as
T
, then you could do something like this:Output
Data