pandas 返回 NaN 的滚动相关性
我想从较大的数组“a”(长度:数百万个元素)中获得小数组“b”(长度:数百个元素)的最佳匹配。 我正在尝试使用 pandas、rolling 和 corr 来将“b”与“a”上的滑动窗口进行比较。 这是我的代码:
import pandas as pd
a = pd.read_csv(<file1>)
b = pd.read_csv(<file2>)
normalized_a = (a - a.mean()) / a.std()
normalized_b = (b - b.mean()) / b.std()
res = a.rolling(window=len(b)).corr(b)
Dataframe a is:
0
0 0.941042
1 0.656281
2 0.969081
3 0.881595
4 0.848359
... ...
1814386 -1.323574
1814387 -1.351035
1814388 -1.359450
1814389 -1.296941
1814390 -1.266813
Dataframe b:
0 -2.256496
1 -2.949674
2 -1.614618
3 -1.784006
4 -0.976331
.. ...
287 0.378578
288 0.247859
289 0.375981
290 0.444575
291 0.450435
然而,res 包含所有 NaN,但有一个元素(事实上,res.count() 的输出为 1):
0
0 NaN
1 NaN
2 NaN
3 NaN
4 NaN
... ..
1814386 NaN
1814387 NaN
1814388 NaN
1814389 NaN
1814390 NaN
res 中唯一的非 NaN 元素位于第 291 行(通过 res.idxmax() 发现):
280 NaN
281 NaN
282 NaN
283 NaN
284 NaN
285 NaN
286 NaN
287 NaN
288 NaN
289 NaN
290 NaN
291 -0.134144
292 NaN
293 NaN
294 NaN
295 NaN
296 NaN
297 NaN
298 NaN
299 NaN
有人知道为什么我得到所有这些 NaN 吗?我本希望在第 292 行之后获得有意义的值。 corr 是成对运算吗?
谢谢!
I would like to get the best match of a small array "b" (length: few hundreds of elements) from a bigger array "a" (length: few millions of elements).
I am trying to use pandas, rolling and corr for comparing "b" with a sliding window over "a".
This is my code:
import pandas as pd
a = pd.read_csv(<file1>)
b = pd.read_csv(<file2>)
normalized_a = (a - a.mean()) / a.std()
normalized_b = (b - b.mean()) / b.std()
res = a.rolling(window=len(b)).corr(b)
Dataframe a is:
0
0 0.941042
1 0.656281
2 0.969081
3 0.881595
4 0.848359
... ...
1814386 -1.323574
1814387 -1.351035
1814388 -1.359450
1814389 -1.296941
1814390 -1.266813
Dataframe b:
0 -2.256496
1 -2.949674
2 -1.614618
3 -1.784006
4 -0.976331
.. ...
287 0.378578
288 0.247859
289 0.375981
290 0.444575
291 0.450435
However, res contains all NaNs, but one element (in fact, output of res.count() is 1):
0
0 NaN
1 NaN
2 NaN
3 NaN
4 NaN
... ..
1814386 NaN
1814387 NaN
1814388 NaN
1814389 NaN
1814390 NaN
The only non-NaN element in res is located at row 291 (found with res.idxmax()):
280 NaN
281 NaN
282 NaN
283 NaN
284 NaN
285 NaN
286 NaN
287 NaN
288 NaN
289 NaN
290 NaN
291 -0.134144
292 NaN
293 NaN
294 NaN
295 NaN
296 NaN
297 NaN
298 NaN
299 NaN
Does anybody know why I get all these NaNs? I would have expected to get meaningful values after row 292. Is corr a pairwise operation?
Thanks!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论