pandas 填充数据框中给定的缺失时间间隔

发布于 2025-01-14 10:41:42 字数 1818 浏览 0 评论 0原文

我有一个数据框,如下所示:

gap_idspeciestime_starttime_stop1wheat2021-11-22:002fescue2021-12-1805:
0001 ::002021-11-2200:
52002021-12-18:03 05:53:00

我想扩展 DataFrame 以便我对于每个 gap_id,获取与 time_starttime_stop 之间的分钟数一样多的行:

gap_id物种时间
1小麦2021-11-22 00: 01:00
1小麦2021-11-22 00:02:00
1小麦2021-11-22 00:03:00
2fescue2021-12-18 05:52:00
2fescue2021-12-18 05:53:00

我尝试过方法 pd.data_range 但我不知道如何将它与在 gap_id 上制作的 groupby

提前致谢

I have a DataFrame looking like:

gap_idspeciestime_starttime_stop
1wheat2021-11-22 00:01:002021-11-22 00:03:00
2fescue2021-12-18 05:52:002021-12-18 05:53:00

I would like to expand the DataFrame such that I get as many rows as the number of minutes between time_start and time_stop for each gap_id:

gap_idspeciestime
1wheat2021-11-22 00:01:00
1wheat2021-11-22 00:02:00
1wheat2021-11-22 00:03:00
2fescue2021-12-18 05:52:00
2fescue2021-12-18 05:53:00

I've tried the method pd.data_range but I don't know how to couple it with a groupby made on gap_id

Thanks in advance

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

花海 2025-01-21 10:41:42

如果小 DataFrame 和性能并不重要,则为每行生成 date_range 然后使用 DataFrame.explode

df['time'] = df.apply(lambda x: pd.date_range(x['time_start'], x['time_stop'], freq='T'), axis=1)
df = df.drop(['time_start','time_stop'], axis=1).explode('time')

print (df)
   gap_id species                time
0       1   wheat 2021-11-22 00:01:00
0       1   wheat 2021-11-22 00:02:00
0       1   wheat 2021-11-22 00:03:00
1       2  fescue 2021-12-18 05:52:00
1       2  fescue 2021-12-18 05:53:00

对于大型 DataFrame,首先以分钟为单位按差异 startstop 列重复索引,然后按 GroupBy.cumcount通过 to_timedelta

df['time_start'] = pd.to_datetime(df['time_start'])
df['time_stop'] = pd.to_datetime(df['time_stop'])

df = (df.loc[df.index.repeat(df['time_stop'].sub(df['time_start']).dt.total_seconds() // 60 + 1)]
        .drop('time_stop', axis=1)
        .rename(columns={'time_start':'time'}))
       
td = pd.to_timedelta(df.groupby(level=0).cumcount(), unit='Min')

df['time'] += td
df = df.reset_index(drop=True)
print (df)
   gap_id species                time
0       1   wheat 2021-11-22 00:01:00
1       1   wheat 2021-11-22 00:02:00
2       1   wheat 2021-11-22 00:03:00
3       2  fescue 2021-12-18 05:52:00
4       2  fescue 2021-12-18 05:53:00

If small DataFrame and performance is not important generate for each row date_range and then use DataFrame.explode:

df['time'] = df.apply(lambda x: pd.date_range(x['time_start'], x['time_stop'], freq='T'), axis=1)
df = df.drop(['time_start','time_stop'], axis=1).explode('time')

print (df)
   gap_id species                time
0       1   wheat 2021-11-22 00:01:00
0       1   wheat 2021-11-22 00:02:00
0       1   wheat 2021-11-22 00:03:00
1       2  fescue 2021-12-18 05:52:00
1       2  fescue 2021-12-18 05:53:00

For large DataFrames repeat indices by difference start and stop columns in minutes first and then add counter by GroupBy.cumcount with convert to timedeltas by to_timedelta :

df['time_start'] = pd.to_datetime(df['time_start'])
df['time_stop'] = pd.to_datetime(df['time_stop'])

df = (df.loc[df.index.repeat(df['time_stop'].sub(df['time_start']).dt.total_seconds() // 60 + 1)]
        .drop('time_stop', axis=1)
        .rename(columns={'time_start':'time'}))
       
td = pd.to_timedelta(df.groupby(level=0).cumcount(), unit='Min')

df['time'] += td
df = df.reset_index(drop=True)
print (df)
   gap_id species                time
0       1   wheat 2021-11-22 00:01:00
1       1   wheat 2021-11-22 00:02:00
2       1   wheat 2021-11-22 00:03:00
3       2  fescue 2021-12-18 05:52:00
4       2  fescue 2021-12-18 05:53:00
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文