删除任何具有非唯一时间戳的行
我的论文被困在一些代码上,该论文将在几天内到期,因此非常感谢您的帮助。
我有一个 NumPy 数组,如下所示:
[['2017-01-30T06:00:00.000000000', 48.67, 55.04],
['2017-01-30T06:00:00.000000000', 49.55249735, 55.04],
['2017-01-30T07:00:00.000000000', 48.67262295, 55.04],
['2017-01-30T08:00:00.000000000', 48.67262295, 55.04],
['2017-01-30T08:00:00.000000000', 48.55544345, 55.04],
['2017-01-30T09:00:00.000000000', 48.67262295, 55.04]...
我正在尝试检查每行的时间戳,如果任何时间戳出现多次,我想删除具有该时间戳的所有行。因此,生成的数组将如下所示:
[['2017-01-30T07:00:00.000000000', 48.67262295, 55.04],
['2017-01-30T09:00:00.000000000', 48.67262295, 55.04]...
即上午 6 点和上午 8 点的行被删除,因为它们出现多次。
我尝试过使用 np.unique 但无法使其正常工作。我还尝试循环遍历数组并检查时间戳是否等于前一个时间戳,然后删除这两个时间戳,但是如果存在相同时间戳的第三个或更多实例,则这不起作用。
我真的很忙,所以任何帮助将非常非常感谢。
到目前为止我尝试过的代码是这样的:
def del_duplicate_rows(data):
date_times = []
for d in data:
date_times.append(d)
if len(date_times) > 1:
if date_times[d] == date_times[d-1]:
data = np.delete(data, d, axis=0)
data = np.delete(data, d-1, axis=0)
return data
I am stuck on a bit of code for my dissertation which is due in a few days so help would be really appreciated.
I have a NumPy array that looks like this:
[['2017-01-30T06:00:00.000000000', 48.67, 55.04],
['2017-01-30T06:00:00.000000000', 49.55249735, 55.04],
['2017-01-30T07:00:00.000000000', 48.67262295, 55.04],
['2017-01-30T08:00:00.000000000', 48.67262295, 55.04],
['2017-01-30T08:00:00.000000000', 48.55544345, 55.04],
['2017-01-30T09:00:00.000000000', 48.67262295, 55.04]...
I am trying to check the timestamp of each row, and if any timestamp appears more than once, I want to remove all rows which have that timestamp. So the resulting array would look like this:
[['2017-01-30T07:00:00.000000000', 48.67262295, 55.04],
['2017-01-30T09:00:00.000000000', 48.67262295, 55.04]...
i.e. the rows at 6am and 8am are deleted because they appear more than once.
I have tried using np.unique
but cannot get this to work. I have also tried looping through the array and checking if the timestamp is equal to the previous timestamp, and then delete both of these however that does not work if there is a third or more instance of the same timestamp.
I am really stuck for time so any help would be greatly, greatly appreciated.
The code I have tried so far is this:
def del_duplicate_rows(data):
date_times = []
for d in data:
date_times.append(d)
if len(date_times) > 1:
if date_times[d] == date_times[d-1]:
data = np.delete(data, d, axis=0)
data = np.delete(data, d-1, axis=0)
return data
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
不需要
pandas
或numpy
。您所需要的只是 Python 的内置itertools
:根据给定的数据,输出:
解释:
由于时间戳已排序,我们可以使用
.groupby()
将具有相同时间戳的条目分组到一个列表中。然后,我们迭代这些组,保留那些只有一个条目的组。然后,我们使用 chain.from_iterable() 展平剩余的单项组列表以获得我们想要的结果。
No need for
pandas
ornumpy
. All you need is Python's built-initertools
:With the given data, this outputs:
Explanation:
Since the timestamps are sorted, we can use
.groupby()
to group entries with the same timestamp together into one list.Then, we iterate over these groups, retaining those with only one entry. Then, we flatten the remaining one-entry group lists using
chain.from_iterable()
to obtain our desired result.如果你碰巧已经有 pandas 数据框中的数据,你可以这样做
If you happen to have the data in a pandas dataframe already, you can do
要查找唯一值并计数它们,您可以使用
np.unique(...,return_counts = true,return_index = true)
,您可以找到count == 1
的值然后找到索引,然后从查找索引如下返回原始数组:输出:
For finding unique values and counting them you can use
np.unique(..., return_counts=True, return_index=True)
then you can find values thatcount == 1
then find the index and return the original array from finding index like below:Output:
在这里您使用的是numpy.unique:
结果:
Here you are using numpy.unique:
Result: