给定开始和停止日期当天的活动项目计数
我有一个包含 2 列的数据框,类似于下面的内容。
+------+-------------+------------+
| id | start_date | stop_date |
+------+-------------+------------+
| Foo | 2019-06-01 | 2019-06-03 |
| Bar | 2019-06-07 | 2019-06-10 |
| Pop | 2019-06-09 | 2019-06-11 |
| Bob | 2019-06-13 | |
| Tom | 2019-06-01 | 2019-06-05 |
| Tim | 2019-06-04 | 2019-06-05 |
| Ben | 2019-06-07 | 2019-06-09 |
| Ted | 2019-06-08 | 2019-06-09 |
+------+------------+-------------+
我需要返回 2 个 df,一个包含日期范围内的活动项目的数量(下面的示例)
+------------+-------+
| Day |Active |
+------------+-------+
| 2019-06-01 | 2 |
| 2019-06-02 | 2 |
| 2019-06-03 | 2 |
| 2019-06-04 | 2 |
| 2019-06-05 | 2 |
| 2019-06-06 | 0 |
| 2019-06-07 | 2 |
| 2019-06-08 | 3 |
| 2019-06-09 | 4 |
| 2019-06-10 | 2 |
| 2019-06-11 | 1 |
| 2019-06-12 | 0 |
| 2019-06-13 | 1 |
| 2019-06-14 | 1 |
| 2019-06-15 | 1 |
+------------+-------+
,另一个返回包含给定日期的活动项目的 df,即 2019-06-10 返回 df:
| Bar | 2019-06-07 | 2019-06-10 |
| Pop | 2019-06-09 | 2019-06-11 |
到目前为止,我已尝试返回第二个示例:
active_date = pd.Timestamp('2019-06-10')
df_active = df[(df['start_date'] <= active_date) & ((df["stop_date"].isnull()) | (df["stop_date"] > active_date))]`
感谢任何帮助!
I have a dataframe with 2 columns similar to below.
+------+-------------+------------+
| id | start_date | stop_date |
+------+-------------+------------+
| Foo | 2019-06-01 | 2019-06-03 |
| Bar | 2019-06-07 | 2019-06-10 |
| Pop | 2019-06-09 | 2019-06-11 |
| Bob | 2019-06-13 | |
| Tom | 2019-06-01 | 2019-06-05 |
| Tim | 2019-06-04 | 2019-06-05 |
| Ben | 2019-06-07 | 2019-06-09 |
| Ted | 2019-06-08 | 2019-06-09 |
+------+------------+-------------+
I need to return 2 df's, one with the count of active items within the date range (example below)
+------------+-------+
| Day |Active |
+------------+-------+
| 2019-06-01 | 2 |
| 2019-06-02 | 2 |
| 2019-06-03 | 2 |
| 2019-06-04 | 2 |
| 2019-06-05 | 2 |
| 2019-06-06 | 0 |
| 2019-06-07 | 2 |
| 2019-06-08 | 3 |
| 2019-06-09 | 4 |
| 2019-06-10 | 2 |
| 2019-06-11 | 1 |
| 2019-06-12 | 0 |
| 2019-06-13 | 1 |
| 2019-06-14 | 1 |
| 2019-06-15 | 1 |
+------------+-------+
and another that returns a df with that contain active items for a given date ie
2019-06-10 returns df:
| Bar | 2019-06-07 | 2019-06-10 |
| Pop | 2019-06-09 | 2019-06-11 |
So far I have tried to return the the second example:
active_date = pd.Timestamp('2019-06-10')
df_active = df[(df['start_date'] <= active_date) & ((df["stop_date"].isnull()) | (df["stop_date"] > active_date))]`
Any help is appreciated!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
您可以这样做:
输出:
并且,使用 pd.IntervalIndex:
输出:
You can do this:
Output:
And, use pd.IntervalIndex:
Output: