使用两列中存在的缺失范围填充 DataFrame

发布于 2025-01-14 03:55:41 字数 2442 浏览 0 评论 0原文

我有一个数据帧,其中包含作为音频剪辑范围的开始和结束时间戳,可以这样生成:

import pandas as pd

df = pd.DataFrame( 
{'start': 
  {0: pd.Timestamp('1900-01-01 00:00:14.373000'), 1: pd.Timestamp('1900-01-01 00:00:16.342000'),2: pd.Timestamp('1900-01-01 00:00:18.743000'), 3: pd.Timestamp('1900-01-01 00:00:21.383000'), 4: pd.Timestamp('1900-01-01 00:00:22.812000')}, 
'end': 
  {0: pd.Timestamp('1900-01-01 00:00:16.342000'), 1: pd.Timestamp('1900-01-01 00:00:18.543000'), 2: pd.Timestamp('1900-01-01 00:00:20.712000'), 3: pd.Timestamp('1900-01-01 00:00:22.482000'), 4: pd.Timestamp('1900-01-01 00:00:24.653000')}})
                      start                     end
0   1900-01-01 00:00:14.373 1900-01-01 00:00:16.342
1   1900-01-01 00:00:16.342 1900-01-01 00:00:18.543
2   1900-01-01 00:00:18.743 1900-01-01 00:00:20.712
3   1900-01-01 00:00:21.383 1900-01-01 00:00:22.482
4   1900-01-01 00:00:22.812 1900-01-01 00:00:24.653

我想生成一个数据帧,填充开始和结束时间戳,其中这些时间戳不出现,这意味着这些条目所在的范围不存在。 所以像这样的事情:

pd.DataFrame( 
{'start': 
  {0: pd.Timestamp('1900-01-01 00:00:00.000000'), 1: pd.Timestamp('1900-01-01 00:00:14.373000'), 2: pd.Timestamp('1900-01-01 00:00:16.342000'), 3: pd.Timestamp('1900-01-01 00:00:18.543000'), 4: pd.Timestamp('1900-01-01 00:00:20.712000'), 5: pd.Timestamp('1900-01-01 00:00:21.383000'), 6: pd.Timestamp('1900-01-01 00:00:22.482000'), 7: pd.Timestamp('1900-01-01 00:00:22.812000')}, 
'end': 
  {0: pd.Timestamp('1900-01-01 00:00:14.373000'), 1: pd.Timestamp('1900-01-01 00:00:16.342000'), 2: pd.Timestamp('1900-01-01 00:00:18.543000'), 3: pd.Timestamp('1900-01-01 00:00:20.712000'), 4: pd.Timestamp('1900-01-01 00:00:21.383000'), 5: pd.Timestamp('1900-01-01 00:00:22.482000'), 6: pd.Timestamp('1900-01-01 00:00:22.812000'), 7: pd.Timestamp('1900-01-01 00:00:24.653000')}})
                      start                     end
0   1900-01-01 00:00:00.000 1900-01-01 00:00:14.373
1   1900-01-01 00:00:14.373 1900-01-01 00:00:16.342
2   1900-01-01 00:00:16.342 1900-01-01 00:00:18.543
3   1900-01-01 00:00:18.543 1900-01-01 00:00:20.712
4   1900-01-01 00:00:20.712 1900-01-01 00:00:21.383
5   1900-01-01 00:00:21.383 1900-01-01 00:00:22.482
6   1900-01-01 00:00:22.482 1900-01-01 00:00:22.812
7   1900-01-01 00:00:22.812 1900-01-01 00:00:24.653

除了迭代各个行之外,我无法提供任何可行的解决方案,最好的方法是什么?

I have a dataframe which contains starting and ending Timestamp acting as range for audio clipping, which can be generated like this:

import pandas as pd

df = pd.DataFrame( 
{'start': 
  {0: pd.Timestamp('1900-01-01 00:00:14.373000'), 1: pd.Timestamp('1900-01-01 00:00:16.342000'),2: pd.Timestamp('1900-01-01 00:00:18.743000'), 3: pd.Timestamp('1900-01-01 00:00:21.383000'), 4: pd.Timestamp('1900-01-01 00:00:22.812000')}, 
'end': 
  {0: pd.Timestamp('1900-01-01 00:00:16.342000'), 1: pd.Timestamp('1900-01-01 00:00:18.543000'), 2: pd.Timestamp('1900-01-01 00:00:20.712000'), 3: pd.Timestamp('1900-01-01 00:00:22.482000'), 4: pd.Timestamp('1900-01-01 00:00:24.653000')}})
                      start                     end
0   1900-01-01 00:00:14.373 1900-01-01 00:00:16.342
1   1900-01-01 00:00:16.342 1900-01-01 00:00:18.543
2   1900-01-01 00:00:18.743 1900-01-01 00:00:20.712
3   1900-01-01 00:00:21.383 1900-01-01 00:00:22.482
4   1900-01-01 00:00:22.812 1900-01-01 00:00:24.653

And I want to generate a dataframe filling starting and ending timestamp where these timestamp don't come, meaning ranges where these entries is not present.
So something like this:

pd.DataFrame( 
{'start': 
  {0: pd.Timestamp('1900-01-01 00:00:00.000000'), 1: pd.Timestamp('1900-01-01 00:00:14.373000'), 2: pd.Timestamp('1900-01-01 00:00:16.342000'), 3: pd.Timestamp('1900-01-01 00:00:18.543000'), 4: pd.Timestamp('1900-01-01 00:00:20.712000'), 5: pd.Timestamp('1900-01-01 00:00:21.383000'), 6: pd.Timestamp('1900-01-01 00:00:22.482000'), 7: pd.Timestamp('1900-01-01 00:00:22.812000')}, 
'end': 
  {0: pd.Timestamp('1900-01-01 00:00:14.373000'), 1: pd.Timestamp('1900-01-01 00:00:16.342000'), 2: pd.Timestamp('1900-01-01 00:00:18.543000'), 3: pd.Timestamp('1900-01-01 00:00:20.712000'), 4: pd.Timestamp('1900-01-01 00:00:21.383000'), 5: pd.Timestamp('1900-01-01 00:00:22.482000'), 6: pd.Timestamp('1900-01-01 00:00:22.812000'), 7: pd.Timestamp('1900-01-01 00:00:24.653000')}})
                      start                     end
0   1900-01-01 00:00:00.000 1900-01-01 00:00:14.373
1   1900-01-01 00:00:14.373 1900-01-01 00:00:16.342
2   1900-01-01 00:00:16.342 1900-01-01 00:00:18.543
3   1900-01-01 00:00:18.543 1900-01-01 00:00:20.712
4   1900-01-01 00:00:20.712 1900-01-01 00:00:21.383
5   1900-01-01 00:00:21.383 1900-01-01 00:00:22.482
6   1900-01-01 00:00:22.482 1900-01-01 00:00:22.812
7   1900-01-01 00:00:22.812 1900-01-01 00:00:24.653

I couldn't come with any feasible solutions other than iterating individual rows, what could be the best way to do this?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

余生一个溪 2025-01-21 03:55:41

IIUC,您可以获得所有唯一的时间戳并从移位值生成新的数据帧:

vals = df[['start', 'end']].stack().unique()
vals2 = np.concatenate([np.array([0], dtype=vals.dtype), vals])
df2 = pd.DataFrame(zip(vals2, vals), columns=['start', 'end'])

输出:

                    start                     end
0 1970-01-01 00:00:00.000 1900-01-01 00:00:14.373
1 1900-01-01 00:00:14.373 1900-01-01 00:00:16.342
2 1900-01-01 00:00:16.342 1900-01-01 00:00:18.543
3 1900-01-01 00:00:18.543 1900-01-01 00:00:18.743
4 1900-01-01 00:00:18.743 1900-01-01 00:00:20.712
5 1900-01-01 00:00:20.712 1900-01-01 00:00:21.383
6 1900-01-01 00:00:21.383 1900-01-01 00:00:22.482
7 1900-01-01 00:00:22.482 1900-01-01 00:00:22.812
8 1900-01-01 00:00:22.812 1900-01-01 00:00:24.653

IIUC, you could get all unique timestamps and generate a new dataframe from the shifted values:

vals = df[['start', 'end']].stack().unique()
vals2 = np.concatenate([np.array([0], dtype=vals.dtype), vals])
df2 = pd.DataFrame(zip(vals2, vals), columns=['start', 'end'])

output:

                    start                     end
0 1970-01-01 00:00:00.000 1900-01-01 00:00:14.373
1 1900-01-01 00:00:14.373 1900-01-01 00:00:16.342
2 1900-01-01 00:00:16.342 1900-01-01 00:00:18.543
3 1900-01-01 00:00:18.543 1900-01-01 00:00:18.743
4 1900-01-01 00:00:18.743 1900-01-01 00:00:20.712
5 1900-01-01 00:00:20.712 1900-01-01 00:00:21.383
6 1900-01-01 00:00:21.383 1900-01-01 00:00:22.482
7 1900-01-01 00:00:22.482 1900-01-01 00:00:22.812
8 1900-01-01 00:00:22.812 1900-01-01 00:00:24.653
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文