每月使用服务的日期次数统计
我目前正在重新整理健康服务数据。我的数据框包括每个人使用服务的开始和结束日期,
id <- c("A", "A", "B")
start <- c("2018-04-01", "2019-04-02", "2018-09-01")
end <- c("2019-04-01", "2019-04-05", "2018-09-02")
df <- data.frame(id, start, end)
id start end
A 2018-04-01 2019-04-01
A 2019-04-02 2019-04-05
B 2018-09-01 2018-09-02
我想做以下事情:(1)计算每个月每个服务使用的日期数; (2) 计算每个人使用服务的日期; (3) 为所有可能的月份构建新的列; (4)生成新的数据框。最终目标是构建以下数据框架:
id 2018_Jan 2018_Feb 2018_Mar 2018_Apr 2018_May 2018_Jun ... 2018_Sep ... 2019_Sep
A 0 0 0 30 31 31 ... 30 ... 1
B 0 0 0 0 0 0 ... 1 ... 0
lubridate
包和 function
命令对此应该有所帮助。我的问题与这篇文章类似 Count日期范围内每个月的天数,它计算每个月的天数。但是,我不确定如何应用它来制定我想要的数据框。
我将非常感谢您在这方面的帮助。
I'm currently re-arranging a health service data. My data frame includes the start and end dates of service use for each individuals
id <- c("A", "A", "B")
start <- c("2018-04-01", "2019-04-02", "2018-09-01")
end <- c("2019-04-01", "2019-04-05", "2018-09-02")
df <- data.frame(id, start, end)
id start end
A 2018-04-01 2019-04-01
A 2019-04-02 2019-04-05
B 2018-09-01 2018-09-02
I want to do the following things: (1) calculate the number of dates in each month for each service use; (2) calculate dates of service use for each individual; (3) construct new columns for all possible months; and (4) generate a new data frame. The ultimate goal is to construct the following data frame:
id 2018_Jan 2018_Feb 2018_Mar 2018_Apr 2018_May 2018_Jun ... 2018_Sep ... 2019_Sep
A 0 0 0 30 31 31 ... 30 ... 1
B 0 0 0 0 0 0 ... 1 ... 0
The lubridate
package and function
command should be helpful in this. My question is similar to this post Count the number of days in each month of a date range, where it counted the number of days in each month. However, I'm not sure how to apply it to formulate the data frame that I want.
I will be really grateful for your help on this.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
这是一个 {tidyverse} 解决方案。
seq()
中包含end - 1
,以不在计数中包含结束日期,这与您的示例一致。dplyr::count()
每个id
的月日数。tidyr::complete()
添加未观察到的月份。tidyr::pivot_wider()
获取每个月的一列。Here's a {tidyverse} solution.
dplyr::summarize()
andseq()
to generate the full range of dates for each observation.end - 1
inseq()
to not include the end date in the count, consistent with your example.lubridate::floor_date(unit = "month")
(technically, changes each date to the first of the month).dplyr::count()
up month-days for eachid
.tidyr::complete()
.tidyr::pivot_wider()
to get a column for each month.这是一种方法。首先,我将 2018 年 1 月到 2019 年 12 月的 id 和年份-月份进行所有组合。然后,我按 id 和年份-月份汇总数据。最后,将两个数据集连接在一起(以确保捕获没有发生任何事情的月份),然后扩大范围。
由 reprex 包 (v2.0.1) 创建于 2022 年 3 月 4 日
Here's one way. First I make all combinations of id, and year-months from jan 2018 to dec 2019. Then, I summarize the data by id and year-month. Finally, join the two datasets together (to make sure you capture the months where nothing happened) and then pivot wider.
Created on 2022-03-04 by the reprex package (v2.0.1)