每天按小组计数事件，包括r中的0

发布于 2025-01-29 23:15:43 字数 1173 浏览 3 评论 0原文

我想要每天计算事件的列，包括没有事件的日期。这是我数据的一个示例，尽管我的真实数据集具有超过100 id的

dt <- structure(list(id = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 
                            2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), date = c("1/01/2000", "2/01/2000", "2/01/2000",
                                                                          "5/01/2000", "5/01/2000", "5/01/2000", "6/01/2000", "2/01/2000", "3/01/2000", 
                                                                          "3/01/2000", "4/01/2000", "4/01/2000", "4/01/2000", "4/01/2000", 
                                                                          "5/01/2000", "9/01/2000")), .Names = c("id", "date"), 
                row.names = c(NA, -16L), class = "data.frame")

我想要的内容：

date       count 1  count 2
1/01/2000    0          0
2/01/2000    2          1
3/01/2000    0          2
4/01/2000    0          4
5/01/2000    3          1
6/01/2000    1          0
7/01/2000    0          0
8/01/2000    0          0
9/01/2000    0          1

我的真实数据将是1/01/2000到31/12/2000。我希望所有id's拥有所有这些日期，即使某些日子内事件为零。

原文

I want columns counting for events per day including dates without events. This is an example of my data although my real dataset has more than 100 ID's

dt <- structure(list(id = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 
                            2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), date = c("1/01/2000", "2/01/2000", "2/01/2000",
                                                                          "5/01/2000", "5/01/2000", "5/01/2000", "6/01/2000", "2/01/2000", "3/01/2000", 
                                                                          "3/01/2000", "4/01/2000", "4/01/2000", "4/01/2000", "4/01/2000", 
                                                                          "5/01/2000", "9/01/2000")), .Names = c("id", "date"), 
                row.names = c(NA, -16L), class = "data.frame")

What I want is:

date       count 1  count 2
1/01/2000    0          0
2/01/2000    2          1
3/01/2000    0          2
4/01/2000    0          4
5/01/2000    3          1
6/01/2000    1          0
7/01/2000    0          0
8/01/2000    0          0
9/01/2000    0          1

My real data will be dates from 1/01/2000 to 31/12/2000. I want all ID's to have all these dates, even if there were zero events during certain days.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

鸩远一方 2025-02-05 23:15:43

的方法

library(data.table)
setDT(dt)[,`:=`(date=as.Date(date, "%Y-%m-%d"),id=paste0("count",id))]
dcast(
  dt[SJ(date=seq(min(date), max(date),1)), on=.(date)],
  date~id,fun.aggregate = length,
)[,`NA`:=NULL]

输出

         date count1 count2
1: 2020-01-01      1      0
2: 2020-01-02      2      1
3: 2020-01-03      0      2
4: 2020-01-04      0      4
5: 2020-01-05      3      1
6: 2020-01-06      1      0
7: 2020-01-07      0      0
8: 2020-01-08      0      0
9: 2020-01-09      0      1

是使用

library(data.table)
setDT(dt)[,`:=`(date=as.Date(date, "%Y-%m-%d"), id=paste0("count",id))]
result = dcast(
  dt[SJ(date=seq(as.Date("2020-01-01"), as.Date("2020-12-31"),1)), on=.(date)],
  date~id,fun.aggregate = length,
)[,`NA`:=NULL]

data.table

           date count1 count2
  1: 2020-01-01      1      0
  2: 2020-01-02      2      1
  3: 2020-01-03      0      2
  4: 2020-01-04      0      4
  5: 2020-01-05      3      1
 ---                         
362: 2020-12-27      0      0
363: 2020-12-28      0      0
364: 2020-12-29      0      0
365: 2020-12-30      0      0
366: 2020-12-31      0      0

这

dt = structure(list(id = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 
                      2L, 2L, 2L, 2L, 2L, 2L), date = c("2020-01-01", "2020-01-02", "2020-01-02", 
                                                        "2020-01-05", "2020-01-05", "2020-01-05", "2020-01-06", "2020-01-02", "2020-01-03", 
                                                        "2020-01-03", "2020-01-04", "2020-01-04", "2020-01-04", "2020-01-04", "2020-01-05", 
                                                        "2020-01-09")), row.names = c(NA, -16L), class = "data.frame")

Here is an approach using data.table

library(data.table)
setDT(dt)[,`:=`(date=as.Date(date, "%Y-%m-%d"),id=paste0("count",id))]
dcast(
  dt[SJ(date=seq(min(date), max(date),1)), on=.(date)],
  date~id,fun.aggregate = length,
)[,`NA`:=NULL]

Output:

         date count1 count2
1: 2020-01-01      1      0
2: 2020-01-02      2      1
3: 2020-01-03      0      2
4: 2020-01-04      0      4
5: 2020-01-05      3      1
6: 2020-01-06      1      0
7: 2020-01-07      0      0
8: 2020-01-08      0      0
9: 2020-01-09      0      1

If you know your dates, as you indicate in the post, you can use those directly:

library(data.table)
setDT(dt)[,`:=`(date=as.Date(date, "%Y-%m-%d"), id=paste0("count",id))]
result = dcast(
  dt[SJ(date=seq(as.Date("2020-01-01"), as.Date("2020-12-31"),1)), on=.(date)],
  date~id,fun.aggregate = length,
)[,`NA`:=NULL]

Output:

           date count1 count2
  1: 2020-01-01      1      0
  2: 2020-01-02      2      1
  3: 2020-01-03      0      2
  4: 2020-01-04      0      4
  5: 2020-01-05      3      1
 ---                         
362: 2020-12-27      0      0
363: 2020-12-28      0      0
364: 2020-12-29      0      0
365: 2020-12-30      0      0
366: 2020-12-31      0      0

Input:

dt = structure(list(id = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 
                      2L, 2L, 2L, 2L, 2L, 2L), date = c("2020-01-01", "2020-01-02", "2020-01-02", 
                                                        "2020-01-05", "2020-01-05", "2020-01-05", "2020-01-06", "2020-01-02", "2020-01-03", 
                                                        "2020-01-03", "2020-01-04", "2020-01-04", "2020-01-04", "2020-01-04", "2020-01-05", 
                                                        "2020-01-09")), row.names = c(NA, -16L), class = "data.frame")

回复收藏 0 原文

风向决定发型 2025-02-05 23:15:43

我们可以使用完成，然后使用pivot_wider重新设计为“ wide”。 OP以“日期”为月/天/年以“日期”格式显示了示例数据。如果原始数据以格式YAURTAN-MANTAGE，则将mdy（date）更改为ymd（date）

library(lubridate)
library(tidyr)
library(dplyr)
library(stringr)
dt %>% 
   mutate(date = mdy(date), count = 1) %>% 
   group_by(id = str_c('count', id)) %>%
   complete(date = seq(min(.$date, na.rm = TRUE), 
                      max(.$date, na.rm = TRUE), by = 'month'),
    fill = list(count = 0)) %>% 
   ungroup %>%
   pivot_wider(names_from = id, values_from =count, 
        values_fn = sum, values_fill = 0)

output

# A tibble: 9 × 3
  date       count1 count2
  <date>      <dbl>  <dbl>
1 2000-01-01      1      0
2 2000-02-01      2      1
3 2000-03-01      0      2
4 2000-04-01      0      4
5 2000-05-01      3      1
6 2000-06-01      1      0
7 2000-07-01      0      0
8 2000-08-01      0      0
9 2000-09-01      0      1

We may use complete and then reshape to 'wide' with pivot_wider. The OP showed the example data with format for 'date' as month/day/year. If the original data is in format year-month-day then change the mdy(date) to ymd(date)

library(lubridate)
library(tidyr)
library(dplyr)
library(stringr)
dt %>% 
   mutate(date = mdy(date), count = 1) %>% 
   group_by(id = str_c('count', id)) %>%
   complete(date = seq(min(.$date, na.rm = TRUE), 
                      max(.$date, na.rm = TRUE), by = 'month'),
    fill = list(count = 0)) %>% 
   ungroup %>%
   pivot_wider(names_from = id, values_from =count, 
        values_fn = sum, values_fill = 0)

-output

# A tibble: 9 × 3
  date       count1 count2
  <date>      <dbl>  <dbl>
1 2000-01-01      1      0
2 2000-02-01      2      1
3 2000-03-01      0      2
4 2000-04-01      0      4
5 2000-05-01      3      1
6 2000-06-01      1      0
7 2000-07-01      0      0
8 2000-08-01      0      0
9 2000-09-01      0      1

回复收藏 0 原文

明媚殇 2025-02-05 23:15:43

的基本R选项

with(
    transform(
        dt,
        date = as.Date(date, "%d/%m/%Y")
    ),
    table(
        factor(date,
            levels = as.character(seq(min(date), max(date), 1))
        ),
        id
    )
)

这是使用表 + seq + factor

            id
             1 2
  2000-01-01 1 0
  2000-01-02 2 1
  2000-01-03 0 2
  2000-01-04 0 4
  2000-01-05 3 1
  2000-01-06 1 0
  2000-01-07 0 0
  2000-01-08 0 0
  2000-01-09 0 1

，或者我们可以使用reshape更进一步+ as.data.frame如果我们想要data.frame

reshape(
    as.data.frame(
        with(
            transform(
                dt,
                date = as.Date(date, "%d/%m/%Y")
            ),
            table(
                factor(date,
                    levels = as.character(seq(min(date), max(date), 1))
                ),
                id
            )
        )
    ),
    idvar = "Var1",
    timevar = "id",
    direction = "wide"
)

输出

        Var1 Freq.1 Freq.2
1 2000-01-01      1      0
2 2000-01-02      2      1
3 2000-01-03      0      2
4 2000-01-04      0      4
5 2000-01-05      3      1
6 2000-01-06      1      0
7 2000-01-07      0      0
8 2000-01-08      0      0
9 2000-01-09      0      1

Here is a base R option using table + seq + factor

with(
    transform(
        dt,
        date = as.Date(date, "%d/%m/%Y")
    ),
    table(
        factor(date,
            levels = as.character(seq(min(date), max(date), 1))
        ),
        id
    )
)

which gives

            id
             1 2
  2000-01-01 1 0
  2000-01-02 2 1
  2000-01-03 0 2
  2000-01-04 0 4
  2000-01-05 3 1
  2000-01-06 1 0
  2000-01-07 0 0
  2000-01-08 0 0
  2000-01-09 0 1

Or, we can step further with reshape + as.data.frame if we want data.frame output

reshape(
    as.data.frame(
        with(
            transform(
                dt,
                date = as.Date(date, "%d/%m/%Y")
            ),
            table(
                factor(date,
                    levels = as.character(seq(min(date), max(date), 1))
                ),
                id
            )
        )
    ),
    idvar = "Var1",
    timevar = "id",
    direction = "wide"
)

which gives

        Var1 Freq.1 Freq.2
1 2000-01-01      1      0
2 2000-01-02      2      1
3 2000-01-03      0      2
4 2000-01-04      0      4
5 2000-01-05      3      1
6 2000-01-06      1      0
7 2000-01-07      0      0
8 2000-01-08      0      0
9 2000-01-09      0      1

回复收藏 0 原文

~没有更多了~