如何按组创建计数器/计数?
我有一些以下形状的数据:
更新:我的数据有一个我想分组的额外变量。我将 ddply 与 Richie 提供的以下解决方案一起使用,但没有成功。
Country,group, date
US,A,'2011-10-01'
US,B,'2011-10-01'
US,C,'2011-10-01'
MX,D,'2011-10-01'
UK,E,'2011-10-02'
UK,B,'2011-10-02'
UK,A,'2011-10-02'
UK,C,'2011-10-02'
数据帧已经排序,因此 A 首先,B 第二,依此类推。我想要创建的是按日期排列的排名变量,如下所示:
Country,group, date,rank
US,A,'2011-10-01',1
US,B,'2011-10-01',2
US,C,'2011-10-01',3
MX,D,'2011-10-01',1
UK,E,'2011-10-02',1
UK,B,'2011-10-02',2
UK,A,'2011-10-02',3
UK,C,'2011-10-02',4
....
I've got some data in the following shape:
UPDATE: My data has an extra variable I'd like to group by. I used ddply with the below solution provided by Richie but did not work.
Country,group, date
US,A,'2011-10-01'
US,B,'2011-10-01'
US,C,'2011-10-01'
MX,D,'2011-10-01'
UK,E,'2011-10-02'
UK,B,'2011-10-02'
UK,A,'2011-10-02'
UK,C,'2011-10-02'
The data frame is already ordered so A came first, B second and so on so forth. What I am trying to create is a rank variable by date like this:
Country,group, date,rank
US,A,'2011-10-01',1
US,B,'2011-10-01',2
US,C,'2011-10-01',3
MX,D,'2011-10-01',1
UK,E,'2011-10-02',1
UK,B,'2011-10-02',2
UK,A,'2011-10-02',3
UK,C,'2011-10-02',4
....
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
首先,使用
class(your_dataset$date)
检查您的日期是否确实采用日期格式(而不是factor
)。如果没有,请使用 lubridate 中的 ymd 进行转换。其次,使用rank
来获取排名。 (比您想象的容易,对吧!)your_dataset$rank <-rank(your_dataset$date)您可能想要探索几种不同的打破关系的方法.重读您的问题后,我发现您不想对日期进行排名,您想要在日期内有一个计数器。为此,首先检查您的数据集是否按日期排序。
然后对每个日期块调用 seq_len 。
First, check that your date really is in a date format (not a
factor
) usingclass(your_dataset$date)
. IF not, useymd
fromlubridate
to convert it.Second, userank
to get the rank. (Easier than you think, right!)your_dataset$rank <- rank(your_dataset$date)There are a few different methods for breaking ties that you might want to explore.Upon rereading your question, I see you don't want to rank the dates, you want a counter within the dates. To do this, first check that your dataset is ordered by date.
Then call
seq_len
on each chunk of date.