根据几乎匹配的 unix 时间戳将 Numpy 数组中的值分配给 Pandas DataFrame
我得到了一个 2D numpy 数组和一个巨大的 pandas DataFrame。它们的虚拟示例看起来有点像这样:
arr = np.array([[1648137283, 0],
[1648137284, 1],
[1648137285, 2],
[1648137286, 3],
.....
[1658137287, 4],
[1658137288, 5],
[1658137289, 6]])
df.head(-6)
unix ... value_a
0 1643137283 ... 23
1 1643137284 ... 54
2 1643137285 ... 25
... ... ... ...
10036787 1653174068 ... 75
10036788 1653174069 ... 65
10036789 1653174070 ... 23
arr 的第一列是 unix 时间戳,第二列是 id 值。 DataFrame 还有一列用于存储 UNIX 时间戳。我的目标是将基于 unix 时间戳的 arr
中的 id 值映射到名为“index”的单独新列中的 df
的相应时间戳。
现在,这些可能是重要的注释:
- 的所有时间戳的一部分
df
仅包含来自arr
df
和arr
沿axis=0
的不同长度,df
中的时间戳按序列排序并重复自身arr
包含来自df< 的所有 Unix 时间戳/code> 但不是大约
- 1% Unix 值并不完全匹配。我的unix采用
unit='ms'
,一些时间戳有+/-1或+/-2的偏差,但是,在我的用例中,它们可以被视为相同,
我可以在循环或使用np.where()
。然而,由于 arr
和 df
相当大,我希望有一个快速的解决方案。
I am given a 2D numpy array and a huge pandas DataFrame. A dummy example of them would look somewhat like this:
arr = np.array([[1648137283, 0],
[1648137284, 1],
[1648137285, 2],
[1648137286, 3],
.....
[1658137287, 4],
[1658137288, 5],
[1658137289, 6]])
df.head(-6)
unix ... value_a
0 1643137283 ... 23
1 1643137284 ... 54
2 1643137285 ... 25
... ... ... ...
10036787 1653174068 ... 75
10036788 1653174069 ... 65
10036789 1653174070 ... 23
In the first column of arr
is a unix timestamp and in the second an id-value. The DataFrame also has a column for the unix timestamp. My goal is to map the id-value from arr
based on the unix timestamp to the corresponding timestamp of df
in a separate new column called 'index'.
Now, these are probably important notes:
df
contains only a portion of all timestamps fromarr
df
andarr
have different lengths along theaxis=0
- the timestamps in
df
are ordered in sequences and repeat themselves arr
contains all unix timestamps fromdf
but not the way around- about 1% of the unix values do not match perfectly. My unix is in
unit='ms'
, some timestamps are off by +/-1 or +/-2, however, in my use cases they can bee seen as identical
I could do this within a loop or with np.where()
. However, as arr
and df
are quite large, I was hoping for a fast solution.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
这个想法是将numpy数组转换为包含键值对的映射,其中键是unix时间戳,值是对应的id,然后您可以使用
series.map
来替换/映射中的值给定数据帧示例输出
The idea is to convert the numpy array to a mapping containing key-val pairs, where key is unix timestamps and value is correponding id, then you can use
series.map
to substitute/map the values in the given dataframeSample output