如何删除一行中相对于另一行在特定时间内的日期时间值?
如果我有一个如下所示的数据框:
Letter | Time | |
---|---|---|
0 | x | 2021-01-01 14:00:00 |
1 | y | 2021-01-01 18:00:00 |
2 | y | 2021-01-03 14:00:00 |
如果时间中有一个值,我将如何删除一行列(日期时间)是否距上行时间 14 小时之内?
我尝试过使用:
from datetime import datetime, timedelta
for i, row in enumerate(df):
if i > 0:
if df.at[i, 'Time'] - df.at[i-1, 'Time'] < timedelta(hours=14):
df = df.drop(i)
else:
pass
else:
pass
但我得到与该行相关的 KeyError 1
if df.at[i, '时间'] - df.at[i-1, '时间']
时间增量(小时=14):
If I have a DataFrame as below:
Letter | Time | |
---|---|---|
0 | x | 2021-01-01 14:00:00 |
1 | y | 2021-01-01 18:00:00 |
2 | y | 2021-01-03 14:00:00 |
How would I delete a row if a value in the Time column(datetime) is within say 14 hours from the time in the row above?
I've tried using:
from datetime import datetime, timedelta
for i, row in enumerate(df):
if i > 0:
if df.at[i, 'Time'] - df.at[i-1, 'Time'] < timedelta(hours=14):
df = df.drop(i)
else:
pass
else:
pass
but I get KeyError 1 in relation to the line
if df.at[i, 'Time'] - df.at[i-1, 'Time'] < timedelta(hours=14):
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
如果某个时间戳与较早的时间戳相差 14 小时之内,则其删除是否取决于较早的时间戳是否被删除?该答案考虑了该问题的答案为“是”的情况。 (如果答案为“否”,则下面测试数据的结果解决方案将仅为第一个时间戳)。
设置
测试数据:
时间戳
如下所示:我们的目标解决方案由第1、第4、第6和第8时间戳组成。
解决方案
该解决方案将使用 piso (pandas 间隔设置操作)包裹。这个想法是为每个时间戳创建一个 14 小时窗口(即间隔),并迭代删除属于较早开始的间隔的时间戳。
mat
将是一个数据帧,其索引和列是时间戳
。 mat.values 看起来像将此矩阵的对角线设置为 True
我们将从第一个间隔开始。从
mat
的第一行您可以推断出需要删除第二个和第三个间隔。因此,我们过滤掉与这些间隔对应的行和列,然后移动下一个间隔(行),依此类推,直到到达最后一行。请注意,我们不需要检查最后一行的任何交集。结果将是一个值全部为 True 的数据框。更重要的是,索引(和列)将是间隔,其左端点是删除 14 小时内的时间戳后剩余的时间戳。
即
pd.Series(mat.index.left)
给出您可以使用它来使用
pandas.Series.isin
note 过滤原始数据帧:我是piso的创造者。如果您有任何反馈或问题,请随时与我们联系。
If a timestamp is within 14hours of an earlier timestamp, does its removal depend on whether the earlier timestamp is removed or not? This answer considers the situation where the answer to this question is "yes". (If the answer is "no" then the resulting solution for the test data below would be the first timestamp only).
setup
test data:
timestamps
looks like this:The solution we are aiming for consists of the 1st, 4th, 6th and 8th timestamps.
solution
This solution will use piso (pandas interval set operations) package. The idea is to create a 14hr window, i.e. interval, for each of your timestamps and iteratively remove timestamps which belong to intervals starting earlier.
mat
will be a dataframe, whose index and columns aretimestamps
.mat.values
looks like thisset diagonal of this matrix to True
We will start with the first interval. From the first row of
mat
you can deduce that the second and third interval need to be dropped. So we filter out the rows and columns corresponding to these intervals, then move the next interval (row) and so on until we reach the last row. Note we do not need to check any intersections for the last row.The result will be a dataframe whose values are all True. More importantly, the index (and columns) will be intervals whose left endpoints are the timestamps remaining after removing those within 14hrs.
i.e.
pd.Series(mat.index.left)
givesYou can use this to filter your original dataframe using
pandas.Series.isin
note: I am the creator of piso. Please feel free to reach out with feedback or questions if you have any.
您可以使用
shift
+rsub
(查找连续时间之间的差异)+div
(转换为小时)创建布尔掩码并对其进行过滤:输出:
You could create a boolean mask by using
shift
+rsub
(finds the difference between consecutive times) +div
(convert to hours) and filter it:Output: