每天按小组计数事件,包括r中的0

发布于 2025-01-29 23:15:43 字数 1173 浏览 3 评论 0原文

我想要每天计算事件的列,包括没有事件的日期。这是我数据的一个示例,尽管我的真实数据集具有超过100 id

dt <- structure(list(id = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 
                            2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), date = c("1/01/2000", "2/01/2000", "2/01/2000",
                                                                          "5/01/2000", "5/01/2000", "5/01/2000", "6/01/2000", "2/01/2000", "3/01/2000", 
                                                                          "3/01/2000", "4/01/2000", "4/01/2000", "4/01/2000", "4/01/2000", 
                                                                          "5/01/2000", "9/01/2000")), .Names = c("id", "date"), 
                row.names = c(NA, -16L), class = "data.frame")

我想要的内容:

date       count 1  count 2
1/01/2000    0          0
2/01/2000    2          1
3/01/2000    0          2
4/01/2000    0          4
5/01/2000    3          1
6/01/2000    1          0
7/01/2000    0          0
8/01/2000    0          0
9/01/2000    0          1

我的真实数据将是1/01/200031/12/2000。我希望所有id's拥有所有这些日期,即使某些日子内事件为零。

I want columns counting for events per day including dates without events. This is an example of my data although my real dataset has more than 100 ID's

dt <- structure(list(id = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 
                            2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), date = c("1/01/2000", "2/01/2000", "2/01/2000",
                                                                          "5/01/2000", "5/01/2000", "5/01/2000", "6/01/2000", "2/01/2000", "3/01/2000", 
                                                                          "3/01/2000", "4/01/2000", "4/01/2000", "4/01/2000", "4/01/2000", 
                                                                          "5/01/2000", "9/01/2000")), .Names = c("id", "date"), 
                row.names = c(NA, -16L), class = "data.frame")

What I want is:

date       count 1  count 2
1/01/2000    0          0
2/01/2000    2          1
3/01/2000    0          2
4/01/2000    0          4
5/01/2000    3          1
6/01/2000    1          0
7/01/2000    0          0
8/01/2000    0          0
9/01/2000    0          1

My real data will be dates from 1/01/2000 to 31/12/2000. I want all ID's to have all these dates, even if there were zero events during certain days.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

鸩远一方 2025-02-05 23:15:43

的方法

library(data.table)
setDT(dt)[,`:=`(date=as.Date(date, "%Y-%m-%d"),id=paste0("count",id))]
dcast(
  dt[SJ(date=seq(min(date), max(date),1)), on=.(date)],
  date~id,fun.aggregate = length,
)[,`NA`:=NULL]

输出

         date count1 count2
1: 2020-01-01      1      0
2: 2020-01-02      2      1
3: 2020-01-03      0      2
4: 2020-01-04      0      4
5: 2020-01-05      3      1
6: 2020-01-06      1      0
7: 2020-01-07      0      0
8: 2020-01-08      0      0
9: 2020-01-09      0      1

是使用

library(data.table)
setDT(dt)[,`:=`(date=as.Date(date, "%Y-%m-%d"), id=paste0("count",id))]
result = dcast(
  dt[SJ(date=seq(as.Date("2020-01-01"), as.Date("2020-12-31"),1)), on=.(date)],
  date~id,fun.aggregate = length,
)[,`NA`:=NULL]

data.table

           date count1 count2
  1: 2020-01-01      1      0
  2: 2020-01-02      2      1
  3: 2020-01-03      0      2
  4: 2020-01-04      0      4
  5: 2020-01-05      3      1
 ---                         
362: 2020-12-27      0      0
363: 2020-12-28      0      0
364: 2020-12-29      0      0
365: 2020-12-30      0      0
366: 2020-12-31      0      0

dt = structure(list(id = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 
                      2L, 2L, 2L, 2L, 2L, 2L), date = c("2020-01-01", "2020-01-02", "2020-01-02", 
                                                        "2020-01-05", "2020-01-05", "2020-01-05", "2020-01-06", "2020-01-02", "2020-01-03", 
                                                        "2020-01-03", "2020-01-04", "2020-01-04", "2020-01-04", "2020-01-04", "2020-01-05", 
                                                        "2020-01-09")), row.names = c(NA, -16L), class = "data.frame")

Here is an approach using data.table

library(data.table)
setDT(dt)[,`:=`(date=as.Date(date, "%Y-%m-%d"),id=paste0("count",id))]
dcast(
  dt[SJ(date=seq(min(date), max(date),1)), on=.(date)],
  date~id,fun.aggregate = length,
)[,`NA`:=NULL]

Output:

         date count1 count2
1: 2020-01-01      1      0
2: 2020-01-02      2      1
3: 2020-01-03      0      2
4: 2020-01-04      0      4
5: 2020-01-05      3      1
6: 2020-01-06      1      0
7: 2020-01-07      0      0
8: 2020-01-08      0      0
9: 2020-01-09      0      1

If you know your dates, as you indicate in the post, you can use those directly:

library(data.table)
setDT(dt)[,`:=`(date=as.Date(date, "%Y-%m-%d"), id=paste0("count",id))]
result = dcast(
  dt[SJ(date=seq(as.Date("2020-01-01"), as.Date("2020-12-31"),1)), on=.(date)],
  date~id,fun.aggregate = length,
)[,`NA`:=NULL]

Output:

           date count1 count2
  1: 2020-01-01      1      0
  2: 2020-01-02      2      1
  3: 2020-01-03      0      2
  4: 2020-01-04      0      4
  5: 2020-01-05      3      1
 ---                         
362: 2020-12-27      0      0
363: 2020-12-28      0      0
364: 2020-12-29      0      0
365: 2020-12-30      0      0
366: 2020-12-31      0      0

Input:

dt = structure(list(id = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 
                      2L, 2L, 2L, 2L, 2L, 2L), date = c("2020-01-01", "2020-01-02", "2020-01-02", 
                                                        "2020-01-05", "2020-01-05", "2020-01-05", "2020-01-06", "2020-01-02", "2020-01-03", 
                                                        "2020-01-03", "2020-01-04", "2020-01-04", "2020-01-04", "2020-01-04", "2020-01-05", 
                                                        "2020-01-09")), row.names = c(NA, -16L), class = "data.frame")
风向决定发型 2025-02-05 23:15:43

我们可以使用完成,然后使用pivot_wider重新设计为“ wide”。 OP以“日期”为月/天/年以“日期”格式显示了示例数据。如果原始数据以格式YAURTAN-MANTAGE,则将mdy(date)更改为ymd(date)

library(lubridate)
library(tidyr)
library(dplyr)
library(stringr)
dt %>% 
   mutate(date = mdy(date), count = 1) %>% 
   group_by(id = str_c('count', id)) %>%
   complete(date = seq(min(.$date, na.rm = TRUE), 
                      max(.$date, na.rm = TRUE), by = 'month'),
    fill = list(count = 0)) %>% 
   ungroup %>%
   pivot_wider(names_from = id, values_from =count, 
        values_fn = sum, values_fill = 0)

output

# A tibble: 9 × 3
  date       count1 count2
  <date>      <dbl>  <dbl>
1 2000-01-01      1      0
2 2000-02-01      2      1
3 2000-03-01      0      2
4 2000-04-01      0      4
5 2000-05-01      3      1
6 2000-06-01      1      0
7 2000-07-01      0      0
8 2000-08-01      0      0
9 2000-09-01      0      1

We may use complete and then reshape to 'wide' with pivot_wider. The OP showed the example data with format for 'date' as month/day/year. If the original data is in format year-month-day then change the mdy(date) to ymd(date)

library(lubridate)
library(tidyr)
library(dplyr)
library(stringr)
dt %>% 
   mutate(date = mdy(date), count = 1) %>% 
   group_by(id = str_c('count', id)) %>%
   complete(date = seq(min(.$date, na.rm = TRUE), 
                      max(.$date, na.rm = TRUE), by = 'month'),
    fill = list(count = 0)) %>% 
   ungroup %>%
   pivot_wider(names_from = id, values_from =count, 
        values_fn = sum, values_fill = 0)

-output

# A tibble: 9 × 3
  date       count1 count2
  <date>      <dbl>  <dbl>
1 2000-01-01      1      0
2 2000-02-01      2      1
3 2000-03-01      0      2
4 2000-04-01      0      4
5 2000-05-01      3      1
6 2000-06-01      1      0
7 2000-07-01      0      0
8 2000-08-01      0      0
9 2000-09-01      0      1
明媚殇 2025-02-05 23:15:43

的基本R选项

with(
    transform(
        dt,
        date = as.Date(date, "%d/%m/%Y")
    ),
    table(
        factor(date,
            levels = as.character(seq(min(date), max(date), 1))
        ),
        id
    )
)

这是使用 + seq + factor

            id
             1 2
  2000-01-01 1 0
  2000-01-02 2 1
  2000-01-03 0 2
  2000-01-04 0 4
  2000-01-05 3 1
  2000-01-06 1 0
  2000-01-07 0 0
  2000-01-08 0 0
  2000-01-09 0 1

,或者我们可以使用reshape更进一步+ as.data.frame如果我们想要data.frame

reshape(
    as.data.frame(
        with(
            transform(
                dt,
                date = as.Date(date, "%d/%m/%Y")
            ),
            table(
                factor(date,
                    levels = as.character(seq(min(date), max(date), 1))
                ),
                id
            )
        )
    ),
    idvar = "Var1",
    timevar = "id",
    direction = "wide"
)

输出

        Var1 Freq.1 Freq.2
1 2000-01-01      1      0
2 2000-01-02      2      1
3 2000-01-03      0      2
4 2000-01-04      0      4
5 2000-01-05      3      1
6 2000-01-06      1      0
7 2000-01-07      0      0
8 2000-01-08      0      0
9 2000-01-09      0      1

Here is a base R option using table + seq + factor

with(
    transform(
        dt,
        date = as.Date(date, "%d/%m/%Y")
    ),
    table(
        factor(date,
            levels = as.character(seq(min(date), max(date), 1))
        ),
        id
    )
)

which gives

            id
             1 2
  2000-01-01 1 0
  2000-01-02 2 1
  2000-01-03 0 2
  2000-01-04 0 4
  2000-01-05 3 1
  2000-01-06 1 0
  2000-01-07 0 0
  2000-01-08 0 0
  2000-01-09 0 1

Or, we can step further with reshape + as.data.frame if we want data.frame output

reshape(
    as.data.frame(
        with(
            transform(
                dt,
                date = as.Date(date, "%d/%m/%Y")
            ),
            table(
                factor(date,
                    levels = as.character(seq(min(date), max(date), 1))
                ),
                id
            )
        )
    ),
    idvar = "Var1",
    timevar = "id",
    direction = "wide"
)

which gives

        Var1 Freq.1 Freq.2
1 2000-01-01      1      0
2 2000-01-02      2      1
3 2000-01-03      0      2
4 2000-01-04      0      4
5 2000-01-05      3      1
6 2000-01-06      1      0
7 2000-01-07      0      0
8 2000-01-08      0      0
9 2000-01-09      0      1
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文