使用两列中存在的缺失范围填充 DataFrame
我有一个数据帧,其中包含作为音频剪辑范围的开始和结束时间戳,可以这样生成:
import pandas as pd
df = pd.DataFrame(
{'start':
{0: pd.Timestamp('1900-01-01 00:00:14.373000'), 1: pd.Timestamp('1900-01-01 00:00:16.342000'),2: pd.Timestamp('1900-01-01 00:00:18.743000'), 3: pd.Timestamp('1900-01-01 00:00:21.383000'), 4: pd.Timestamp('1900-01-01 00:00:22.812000')},
'end':
{0: pd.Timestamp('1900-01-01 00:00:16.342000'), 1: pd.Timestamp('1900-01-01 00:00:18.543000'), 2: pd.Timestamp('1900-01-01 00:00:20.712000'), 3: pd.Timestamp('1900-01-01 00:00:22.482000'), 4: pd.Timestamp('1900-01-01 00:00:24.653000')}})
start end
0 1900-01-01 00:00:14.373 1900-01-01 00:00:16.342
1 1900-01-01 00:00:16.342 1900-01-01 00:00:18.543
2 1900-01-01 00:00:18.743 1900-01-01 00:00:20.712
3 1900-01-01 00:00:21.383 1900-01-01 00:00:22.482
4 1900-01-01 00:00:22.812 1900-01-01 00:00:24.653
我想生成一个数据帧,填充开始和结束时间戳,其中这些时间戳不出现,这意味着这些条目所在的范围不存在。 所以像这样的事情:
pd.DataFrame(
{'start':
{0: pd.Timestamp('1900-01-01 00:00:00.000000'), 1: pd.Timestamp('1900-01-01 00:00:14.373000'), 2: pd.Timestamp('1900-01-01 00:00:16.342000'), 3: pd.Timestamp('1900-01-01 00:00:18.543000'), 4: pd.Timestamp('1900-01-01 00:00:20.712000'), 5: pd.Timestamp('1900-01-01 00:00:21.383000'), 6: pd.Timestamp('1900-01-01 00:00:22.482000'), 7: pd.Timestamp('1900-01-01 00:00:22.812000')},
'end':
{0: pd.Timestamp('1900-01-01 00:00:14.373000'), 1: pd.Timestamp('1900-01-01 00:00:16.342000'), 2: pd.Timestamp('1900-01-01 00:00:18.543000'), 3: pd.Timestamp('1900-01-01 00:00:20.712000'), 4: pd.Timestamp('1900-01-01 00:00:21.383000'), 5: pd.Timestamp('1900-01-01 00:00:22.482000'), 6: pd.Timestamp('1900-01-01 00:00:22.812000'), 7: pd.Timestamp('1900-01-01 00:00:24.653000')}})
start end
0 1900-01-01 00:00:00.000 1900-01-01 00:00:14.373
1 1900-01-01 00:00:14.373 1900-01-01 00:00:16.342
2 1900-01-01 00:00:16.342 1900-01-01 00:00:18.543
3 1900-01-01 00:00:18.543 1900-01-01 00:00:20.712
4 1900-01-01 00:00:20.712 1900-01-01 00:00:21.383
5 1900-01-01 00:00:21.383 1900-01-01 00:00:22.482
6 1900-01-01 00:00:22.482 1900-01-01 00:00:22.812
7 1900-01-01 00:00:22.812 1900-01-01 00:00:24.653
除了迭代各个行之外,我无法提供任何可行的解决方案,最好的方法是什么?
I have a dataframe which contains starting and ending Timestamp acting as range for audio clipping, which can be generated like this:
import pandas as pd
df = pd.DataFrame(
{'start':
{0: pd.Timestamp('1900-01-01 00:00:14.373000'), 1: pd.Timestamp('1900-01-01 00:00:16.342000'),2: pd.Timestamp('1900-01-01 00:00:18.743000'), 3: pd.Timestamp('1900-01-01 00:00:21.383000'), 4: pd.Timestamp('1900-01-01 00:00:22.812000')},
'end':
{0: pd.Timestamp('1900-01-01 00:00:16.342000'), 1: pd.Timestamp('1900-01-01 00:00:18.543000'), 2: pd.Timestamp('1900-01-01 00:00:20.712000'), 3: pd.Timestamp('1900-01-01 00:00:22.482000'), 4: pd.Timestamp('1900-01-01 00:00:24.653000')}})
start end
0 1900-01-01 00:00:14.373 1900-01-01 00:00:16.342
1 1900-01-01 00:00:16.342 1900-01-01 00:00:18.543
2 1900-01-01 00:00:18.743 1900-01-01 00:00:20.712
3 1900-01-01 00:00:21.383 1900-01-01 00:00:22.482
4 1900-01-01 00:00:22.812 1900-01-01 00:00:24.653
And I want to generate a dataframe filling starting and ending timestamp where these timestamp don't come, meaning ranges where these entries is not present.
So something like this:
pd.DataFrame(
{'start':
{0: pd.Timestamp('1900-01-01 00:00:00.000000'), 1: pd.Timestamp('1900-01-01 00:00:14.373000'), 2: pd.Timestamp('1900-01-01 00:00:16.342000'), 3: pd.Timestamp('1900-01-01 00:00:18.543000'), 4: pd.Timestamp('1900-01-01 00:00:20.712000'), 5: pd.Timestamp('1900-01-01 00:00:21.383000'), 6: pd.Timestamp('1900-01-01 00:00:22.482000'), 7: pd.Timestamp('1900-01-01 00:00:22.812000')},
'end':
{0: pd.Timestamp('1900-01-01 00:00:14.373000'), 1: pd.Timestamp('1900-01-01 00:00:16.342000'), 2: pd.Timestamp('1900-01-01 00:00:18.543000'), 3: pd.Timestamp('1900-01-01 00:00:20.712000'), 4: pd.Timestamp('1900-01-01 00:00:21.383000'), 5: pd.Timestamp('1900-01-01 00:00:22.482000'), 6: pd.Timestamp('1900-01-01 00:00:22.812000'), 7: pd.Timestamp('1900-01-01 00:00:24.653000')}})
start end
0 1900-01-01 00:00:00.000 1900-01-01 00:00:14.373
1 1900-01-01 00:00:14.373 1900-01-01 00:00:16.342
2 1900-01-01 00:00:16.342 1900-01-01 00:00:18.543
3 1900-01-01 00:00:18.543 1900-01-01 00:00:20.712
4 1900-01-01 00:00:20.712 1900-01-01 00:00:21.383
5 1900-01-01 00:00:21.383 1900-01-01 00:00:22.482
6 1900-01-01 00:00:22.482 1900-01-01 00:00:22.812
7 1900-01-01 00:00:22.812 1900-01-01 00:00:24.653
I couldn't come with any feasible solutions other than iterating individual rows, what could be the best way to do this?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
IIUC,您可以获得所有唯一的时间戳并从移位值生成新的数据帧:
输出:
IIUC, you could get all unique timestamps and generate a new dataframe from the shifted values:
output: