计算 R 中组中出现的因素

发布于 2024-12-12 10:09:54 字数 965 浏览 0 评论 0原文

这是我的数据:

> head(Kandula_for_n)
                date      dist  date_only
1 2005-05-08 12:00:00  138.5861 2005-05-08
2 2005-05-08 16:00:00 1166.9265 2005-05-08
3 2005-05-08 20:00:00 1270.7149 2005-05-08
6 2005-05-09 08:00:00  233.1971 2005-05-09
7 2005-05-09 12:00:00 1899.9530 2005-05-09
8 2005-05-09 16:00:00  726.8363 2005-05-09

我现在想要一个附加列,其中包含每天数据条目(距离)的计数 (n)。对于2005-05-08,这将是n=3,因为在12点、16点和20点有3个数据条目。我已经应用了以下代码,它实际上给了我想要的东西:

ndist <-tapply(1:NROW(Kandula_for_n), Kandula_for_n$date_only, function(x) length(unique(x)))

在 ndist<-as.data.frame(ndist) 之后,我得到了这个:

> head(ndist)
           ndist
2005-05-08     3
2005-05-09     4
2005-05-10     6
2005-05-11     4
2005-05-12     6
2005-05-13     6

问题是计数与 date_only 一起在一个中名为 ndist 的列。但我需要将它们分在两列中,一列包含计数,一列包含 date_only。这怎么能做到呢? 我想这很简单,但我就是不明白。 如果您能给我任何想法,我将不胜感激。

感谢您的努力。

This is my data:

> head(Kandula_for_n)
                date      dist  date_only
1 2005-05-08 12:00:00  138.5861 2005-05-08
2 2005-05-08 16:00:00 1166.9265 2005-05-08
3 2005-05-08 20:00:00 1270.7149 2005-05-08
6 2005-05-09 08:00:00  233.1971 2005-05-09
7 2005-05-09 12:00:00 1899.9530 2005-05-09
8 2005-05-09 16:00:00  726.8363 2005-05-09

I would now like to have an additional column with the count (n) of the data entries (dist) per day. For 2005-05-08, this would be n=3 as there are 3 data entries at 12, 16 and 20 o'clock. I have applied the following code which actually gave me want I wanted:

ndist <-tapply(1:NROW(Kandula_for_n), Kandula_for_n$date_only, function(x) length(unique(x)))

After ndist<-as.data.frame(ndist), I got this:

> head(ndist)
           ndist
2005-05-08     3
2005-05-09     4
2005-05-10     6
2005-05-11     4
2005-05-12     6
2005-05-13     6

The problem is that the count is together with date_only in one column that is called ndist. But I would need them in two separate columns, one with the count and one with date_only. How can this be done?
I guess its rather simple, but I just don't get it.
I would appreciate if you could give me any thoughts on that.

Thanks for your efforts.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

拥抱我好吗 2024-12-19 10:09:55

这些只是行名称。你很高兴去:

ndist$date = row.names(ndist)

编辑:或 ndist = data.frame(date = names(ndist), ndist) 取决于它是否已经是数据框。

Those are just the row names. You're good to go:

ndist$date = row.names(ndist)

EDIT: or ndist = data.frame(date = names(ndist), ndist) depending on whether it is already a data frame or not.

电影里的梦 2024-12-19 10:09:55

更简单一点怎么样:

as.data.frame(table(unique(Kandula_for_n)$date_only))

How about something a bit more simple:

as.data.frame(table(unique(Kandula_for_n)$date_only))
心意如水 2024-12-19 10:09:54

只是因为我发现 tapply() 很难理解,所以我喜欢使用 plyr 来处理这些类型的事情:

## make up some data
## you get better/faster/more answers if you do this bit for us :)
dates <- seq(Sys.Date(), Sys.Date() + 5, by = 1)
Kandula_for_n <- data.frame(date_only = sample( dates + 5, 10, replace=TRUE ) , dist=rnorm(10) )

require(plyr)
ddply(Kandula_for_n, "date_only", function(x) data.frame(x, ndist=nrow(x)) )

这会给你类似的东西:

    date_only       dist ndist
1  2011-10-30  0.2434168     5
2  2011-10-30 -0.9361780     5
3  2011-10-30  1.4593197     5
4  2011-10-30 -0.1851402     5
5  2011-10-30  0.6652419     5
6  2011-10-31  0.8876420     1
7  2011-11-03  0.5087175     2
8  2011-11-03 -1.0065152     2
9  2011-11-04  0.4236352     2
10 2011-11-04  0.4535686     2

ddply 行:

ddply(Kandula_for_n, "date_only", function(x) data.frame(x, ndist=nrow(x)) )

获取输入数据,按date.only 字段,并且对于每个唯一值,它将匿名函数应用于仅由具有相同 date_only 值的记录组成的数据框。我的匿名函数仅采用 data.frame x 并附加一个名为 ndist 的列,它是 x 中的行数。

Simply because I find tapply() hard to wrap my brain around, I like using plyr for these types of things:

## make up some data
## you get better/faster/more answers if you do this bit for us :)
dates <- seq(Sys.Date(), Sys.Date() + 5, by = 1)
Kandula_for_n <- data.frame(date_only = sample( dates + 5, 10, replace=TRUE ) , dist=rnorm(10) )

require(plyr)
ddply(Kandula_for_n, "date_only", function(x) data.frame(x, ndist=nrow(x)) )

This will give you something like:

    date_only       dist ndist
1  2011-10-30  0.2434168     5
2  2011-10-30 -0.9361780     5
3  2011-10-30  1.4593197     5
4  2011-10-30 -0.1851402     5
5  2011-10-30  0.6652419     5
6  2011-10-31  0.8876420     1
7  2011-11-03  0.5087175     2
8  2011-11-03 -1.0065152     2
9  2011-11-04  0.4236352     2
10 2011-11-04  0.4535686     2

the ddply line:

ddply(Kandula_for_n, "date_only", function(x) data.frame(x, ndist=nrow(x)) )

takes the input data, groups it by the date.only field, and for every unique value it applies the anonymous function to the data frame made up of only the records with the same value for date_only. My anonymous function simply takes the data.frame x and appends a column named ndist which is the number of rows in x.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文