计算 pandas 中日期时间可用的案例数量

发布于 2025-01-12 01:48:49 字数 755 浏览 0 评论 0原文

我有一个案件开始日期和结束日期的数据框。我想计算每个案例开始时有多少个可用案例。

caseNo  startDate   closedDate   
1       2019-01-01  2019-01-03   
2       2019-01-02  2019-01-10   
3       2019-01-03  2019-01-04   
4       2019-01-05  2019-01-10   
5       2019-01-06  2019-01-10   
6       2019-01-07  2019-01-12   
7       2019-01-11  2019-01-15

输出将是：

caseNo  startDate   closedDate   numCases
1       2019-01-01  2019-01-03   0
2       2019-01-02  2019-01-10   1
3       2019-01-03  2019-01-04   1
4       2019-01-05  2019-01-10   1
5       2019-01-06  2019-01-10   2
6       2019-01-07  2019-01-12   3
7       2019-01-11  2019-01-15   1

例如，对于案例 6，案例 2、4、5 仍未关闭。因此，尚有 3 起案件未决。此外，日期实际上是日期时间而不仅仅是日期。为了简洁起见，我在这里仅列出了日期。

原文

I have a dataframe of start date and closed date of cases. I want to do a count of how many cases are available at the start of each case.

caseNo  startDate   closedDate   
1       2019-01-01  2019-01-03   
2       2019-01-02  2019-01-10   
3       2019-01-03  2019-01-04   
4       2019-01-05  2019-01-10   
5       2019-01-06  2019-01-10   
6       2019-01-07  2019-01-12   
7       2019-01-11  2019-01-15

Output will be:

caseNo  startDate   closedDate   numCases
1       2019-01-01  2019-01-03   0
2       2019-01-02  2019-01-10   1
3       2019-01-03  2019-01-04   1
4       2019-01-05  2019-01-10   1
5       2019-01-06  2019-01-10   2
6       2019-01-07  2019-01-12   3
7       2019-01-11  2019-01-15   1

For example, for case 6, cases 2,4,5 still have not been closed. So there are 3 cases outstanding.
Also, the dates are actually datetimes rather than just date. I have only included the date here for brevity.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

尴尬癌患者 2025-01-19 01:48:49

numba 中的解决方案应该提高性能（真实数据中的最佳测试）：

from numba import jit

@jit(nopython=True)
def nb_func(x, y):
    res = np.empty(x.size, dtype=np.int64)
    for i in range(x.size):
        res[i] = np.sum(x[:i] > y[i])
    return res

df['case'] = nb_func(df['closedDate'].to_numpy(), df['startDate'].to_numpy())
print (df)
   caseNo  startDate closedDate  case
0       1 2019-01-01 2019-01-03     0
1       2 2019-01-02 2019-01-10     1
2       3 2019-01-03 2019-01-04     1
3       4 2019-01-05 2019-01-10     1
4       5 2019-01-06 2019-01-10     2
5       6 2019-01-07 2019-01-12     3
6       7 2019-01-11 2019-01-15     1

Solution in numba should increase performance (best test in real data):

from numba import jit

@jit(nopython=True)
def nb_func(x, y):
    res = np.empty(x.size, dtype=np.int64)
    for i in range(x.size):
        res[i] = np.sum(x[:i] > y[i])
    return res

df['case'] = nb_func(df['closedDate'].to_numpy(), df['startDate'].to_numpy())
print (df)
   caseNo  startDate closedDate  case
0       1 2019-01-01 2019-01-03     0
1       2 2019-01-02 2019-01-10     1
2       3 2019-01-03 2019-01-04     1
3       4 2019-01-05 2019-01-10     1
4       5 2019-01-06 2019-01-10     2
5       6 2019-01-07 2019-01-12     3
6       7 2019-01-11 2019-01-15     1

回复收藏 0 原文

活泼老夫 2025-01-19 01:48:49

使用：

res = []
temp = pd.to_datetime(df['closedDate'])
for i, row in df.iterrows():
    temp_res = np.sum(row['startDate']<temp.iloc[:i])
    print(temp_res)
    res.append(temp_res)

输出：

然后您可以将结果添加为 df 列：

Use:

res = []
temp = pd.to_datetime(df['closedDate'])
for i, row in df.iterrows():
    temp_res = np.sum(row['startDate']<temp.iloc[:i])
    print(temp_res)
    res.append(temp_res)

output:

Then you can add the result as a df column:

回复收藏 0 原文

~没有更多了~