pandas 返回 NaN 的滚动相关性

发布于 2025-01-11 21:45:55 字数 1477 浏览 0 评论 0原文

我想从较大的数组“a”(长度:数百万个元素)中获得小数组“b”(长度:数百个元素)的最佳匹配。 我正在尝试使用 pandas、rolling 和 corr 来将“b”与“a”上的滑动窗口进行比较。 这是我的代码:

import pandas as pd
    
a = pd.read_csv(<file1>) 
b = pd.read_csv(<file2>)
    
normalized_a = (a - a.mean()) / a.std() 
normalized_b = (b - b.mean()) / b.std()

res = a.rolling(window=len(b)).corr(b)

Dataframe a is:

                 0
0        0.941042
1        0.656281
2        0.969081
3        0.881595
4        0.848359
...           ...
1814386 -1.323574
1814387 -1.351035
1814388 -1.359450
1814389 -1.296941
1814390 -1.266813

Dataframe b:

0   -2.256496
1   -2.949674
2   -1.614618
3   -1.784006
4   -0.976331
..        ...
287  0.378578
288  0.247859
289  0.375981
290  0.444575
291  0.450435

然而,res 包含所有 NaN,但有一个元素(事实上,res.count() 的输出为 1):

          0
0       NaN
1       NaN
2       NaN
3       NaN
4       NaN
...      ..
1814386 NaN
1814387 NaN
1814388 NaN
1814389 NaN
1814390 NaN

res 中唯一的非 NaN 元素位于第 291 行(通过 res.idxmax() 发现):

280       NaN
281       NaN
282       NaN
283       NaN
284       NaN
285       NaN
286       NaN
287       NaN
288       NaN
289       NaN
290       NaN
291 -0.134144
292       NaN
293       NaN
294       NaN
295       NaN
296       NaN
297       NaN
298       NaN
299       NaN

有人知道为什么我得到所有这些 NaN 吗?我本希望在第 292 行之后获得有意义的值。 corr 是成对运算吗?

谢谢!

I would like to get the best match of a small array "b" (length: few hundreds of elements) from a bigger array "a" (length: few millions of elements).
I am trying to use pandas, rolling and corr for comparing "b" with a sliding window over "a".
This is my code:

import pandas as pd
    
a = pd.read_csv(<file1>) 
b = pd.read_csv(<file2>)
    
normalized_a = (a - a.mean()) / a.std() 
normalized_b = (b - b.mean()) / b.std()

res = a.rolling(window=len(b)).corr(b)

Dataframe a is:

                 0
0        0.941042
1        0.656281
2        0.969081
3        0.881595
4        0.848359
...           ...
1814386 -1.323574
1814387 -1.351035
1814388 -1.359450
1814389 -1.296941
1814390 -1.266813

Dataframe b:

0   -2.256496
1   -2.949674
2   -1.614618
3   -1.784006
4   -0.976331
..        ...
287  0.378578
288  0.247859
289  0.375981
290  0.444575
291  0.450435

However, res contains all NaNs, but one element (in fact, output of res.count() is 1):

          0
0       NaN
1       NaN
2       NaN
3       NaN
4       NaN
...      ..
1814386 NaN
1814387 NaN
1814388 NaN
1814389 NaN
1814390 NaN

The only non-NaN element in res is located at row 291 (found with res.idxmax()):

280       NaN
281       NaN
282       NaN
283       NaN
284       NaN
285       NaN
286       NaN
287       NaN
288       NaN
289       NaN
290       NaN
291 -0.134144
292       NaN
293       NaN
294       NaN
295       NaN
296       NaN
297       NaN
298       NaN
299       NaN

Does anybody know why I get all these NaNs? I would have expected to get meaningful values after row 292. Is corr a pairwise operation?

Thanks!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。
列表为空,暂无数据
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文