计算 R 中组中出现的因素
这是我的数据:
> head(Kandula_for_n)
date dist date_only
1 2005-05-08 12:00:00 138.5861 2005-05-08
2 2005-05-08 16:00:00 1166.9265 2005-05-08
3 2005-05-08 20:00:00 1270.7149 2005-05-08
6 2005-05-09 08:00:00 233.1971 2005-05-09
7 2005-05-09 12:00:00 1899.9530 2005-05-09
8 2005-05-09 16:00:00 726.8363 2005-05-09
我现在想要一个附加列,其中包含每天数据条目(距离)的计数 (n)。对于2005-05-08,这将是n=3,因为在12点、16点和20点有3个数据条目。我已经应用了以下代码,它实际上给了我想要的东西:
ndist <-tapply(1:NROW(Kandula_for_n), Kandula_for_n$date_only, function(x) length(unique(x)))
在 ndist<-as.data.frame(ndist) 之后,我得到了这个:
> head(ndist)
ndist
2005-05-08 3
2005-05-09 4
2005-05-10 6
2005-05-11 4
2005-05-12 6
2005-05-13 6
问题是计数与 date_only 一起在一个中名为 ndist 的列。但我需要将它们分在两列中,一列包含计数,一列包含 date_only。这怎么能做到呢? 我想这很简单,但我就是不明白。 如果您能给我任何想法,我将不胜感激。
感谢您的努力。
This is my data:
> head(Kandula_for_n)
date dist date_only
1 2005-05-08 12:00:00 138.5861 2005-05-08
2 2005-05-08 16:00:00 1166.9265 2005-05-08
3 2005-05-08 20:00:00 1270.7149 2005-05-08
6 2005-05-09 08:00:00 233.1971 2005-05-09
7 2005-05-09 12:00:00 1899.9530 2005-05-09
8 2005-05-09 16:00:00 726.8363 2005-05-09
I would now like to have an additional column with the count (n) of the data entries (dist) per day. For 2005-05-08, this would be n=3 as there are 3 data entries at 12, 16 and 20 o'clock. I have applied the following code which actually gave me want I wanted:
ndist <-tapply(1:NROW(Kandula_for_n), Kandula_for_n$date_only, function(x) length(unique(x)))
After ndist<-as.data.frame(ndist)
, I got this:
> head(ndist)
ndist
2005-05-08 3
2005-05-09 4
2005-05-10 6
2005-05-11 4
2005-05-12 6
2005-05-13 6
The problem is that the count is together with date_only in one column that is called ndist. But I would need them in two separate columns, one with the count and one with date_only. How can this be done?
I guess its rather simple, but I just don't get it.
I would appreciate if you could give me any thoughts on that.
Thanks for your efforts.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
这些只是行名称。你很高兴去:
编辑:或
ndist = data.frame(date = names(ndist), ndist)
取决于它是否已经是数据框。Those are just the row names. You're good to go:
EDIT: or
ndist = data.frame(date = names(ndist), ndist)
depending on whether it is already a data frame or not.更简单一点怎么样:
How about something a bit more simple:
只是因为我发现
tapply()
很难理解,所以我喜欢使用plyr
来处理这些类型的事情:这会给你类似的东西:
ddply
行:ddply(Kandula_for_n, "date_only", function(x) data.frame(x, ndist=nrow(x)) )
获取输入数据,按
date.only
字段,并且对于每个唯一值,它将匿名函数应用于仅由具有相同date_only
值的记录组成的数据框。我的匿名函数仅采用 data.framex
并附加一个名为ndist
的列,它是x
中的行数。Simply because I find
tapply()
hard to wrap my brain around, I like usingplyr
for these types of things:This will give you something like:
the
ddply
line:ddply(Kandula_for_n, "date_only", function(x) data.frame(x, ndist=nrow(x)) )
takes the input data, groups it by the
date.only
field, and for every unique value it applies the anonymous function to the data frame made up of only the records with the same value fordate_only
. My anonymous function simply takes the data.framex
and appends a column namedndist
which is the number of rows inx
.