当前位置：文江博客话题详情

python中的滚动中位数

发布于 2024-10-29 06:10:15 字数 85 浏览 10 评论 0原文

我有一些基于每日收盘价的股票数据。我需要能够将这些值插入到 python 列表中并获取最近 30 个收盘价的中位数。有没有一个Python库可以做到这一点？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

情释 2024-11-05 06:10:15

在纯 Python 中，将数据放在 Python 列表 a 中，您可以这样做

median = sum(sorted(a[-30:])[14:16]) / 2.0

（假设 a 至少有 30 个项目。）

使用 NumPy 包，您可以使用

median = numpy.median(a[-30:])

In pure Python, having your data in a Python list a, you could do

median = sum(sorted(a[-30:])[14:16]) / 2.0

(This assumes a has at least 30 items.)

Using the NumPy package, you could use

median = numpy.median(a[-30:])

回复收藏 0 原文

不忘初心 2024-11-05 06:10:15

你考虑过pandas吗？它基于numpy，可以自动将时间戳与您的数据关联起来，并且只要您用numpy.nan填充它，就会丢弃任何未知日期。它还通过 matplotlib 提供了一些相当强大的绘图功能。

基本上它是为Python 中的财务分析而设计的。

回复收藏 0 原文

送舟行 2024-11-05 06:10:15

中位数不就是排序范围内的中间值吗？

因此，假设您的列表是 stock_data：

last_thirty = stock_data[-30:]
median = sorted(last_thirty)[15]

现在您只需找到并修复相差一的错误，并处理 stock_data 元素少于 30 个的情况...

让我们在这里尝试一下：

def rolling_median(data, window):
    if len(data) < window:
       subject = data[:]
    else:
       subject = data[-30:]
    return sorted(subject)[len(subject)/2]

isn't the median just the middle value in a sorted range?

so, assuming your list is stock_data:

last_thirty = stock_data[-30:]
median = sorted(last_thirty)[15]

Now you just need to get the off-by-one errors found and fixed and also handle the case of stock_data being less than 30 elements...

let us try that here a bit:

def rolling_median(data, window):
    if len(data) < window:
       subject = data[:]
    else:
       subject = data[-30:]
    return sorted(subject)[len(subject)/2]

回复收藏 0 原文

许一世地老天荒 2024-11-05 06:10:15

#发现这很有帮助：

import numpy as np
list=[10,20,30,40,50]

med=[]
j=0
for x in list:
    sub_set=list[0:j+1]
    median = np.median(sub_set)
    med.append(median)    
    j+=1
print(med)

#found this helpful:

import numpy as np
list=[10,20,30,40,50]

med=[]
j=0
for x in list:
    sub_set=list[0:j+1]
    median = np.median(sub_set)
    med.append(median)    
    j+=1
print(med)

回复收藏 0 原文

情愿 2024-11-05 06:10:15

虽然答案是正确的，但滚动中位数在循环内调用 np.median 会产生巨大的开销。这是一种更快的方法，具有 w*|x| 空间复杂度。

import numpy as np

def moving_median(x, w):
    shifted = np.zeros((len(x)+w-1, w))
    shifted[:,:] = np.nan
    for idx in range(w-1):
        shifted[idx:-w+idx+1, idx] = x
    shifted[idx+1:, idx+1] = x
    # print(shifted)
    medians = np.median(shifted, axis=1)
    for idx in range(w-1):
        medians[idx] = np.median(shifted[idx, :idx+1])
        medians[-idx-1] = np.median(shifted[-idx-1, -idx-1:])
    return medians[(w-1)//2:-(w-1)//2]

moving_median(np.arange(10), 4)
# Output
array([0.5, 1. , 1.5, 2.5, 3.5, 4.5, 5.5, 6.5, 7.5, 8. ])

输出与输入向量具有相同的长度。
少于一个条目的行将被忽略，其中一半为 nan（仅发生在偶数窗口宽度），仅返回第一个选项。这是上面的shifted_matrix以及各自的中值：

[[ 0. nan nan nan] -> -
 [ 1.  0. nan nan] -> 0.5
 [ 2.  1.  0. nan] -> 1.0
 [ 3.  2.  1.  0.] -> 1.5
 [ 4.  3.  2.  1.] -> 2.5
 [ 5.  4.  3.  2.] -> 3.5
 [ 6.  5.  4.  3.] -> 4.5
 [ 7.  6.  5.  4.] -> 5.5
 [ 8.  7.  6.  5.] -> 6.5
 [ 9.  8.  7.  6.] -> 7.5
 [nan  9.  8.  7.] -> 8.0
 [nan nan  9.  8.] -> -
 [nan nan nan  9.]]-> -

可以通过调整最终切片来改变行为medians[(w-1)//2:-(w-1)//2] 。

基准：

%%timeit
moving_median(np.arange(1000), 4)
# 267 µs ± 759 ns per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

替代方法：（结果将发生变化）

def moving_median_list(x, w):
    medians = np.zeros(len(x))
    for j in range(len(x)):
        medians[j] = np.median(x[j:j+w])
    return medians

%%timeit
moving_median_list(np.arange(1000), 4)
# 15.7 ms ± 115 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

两种算法都具有线性时间复杂度。
因此，函数 moving_median 将是更快的选择。

While the answers are correct, the rolling median would have a huge overhead of calling np.median within a loop. Here is a much faster method with w*|x| space complexity.

import numpy as np

def moving_median(x, w):
    shifted = np.zeros((len(x)+w-1, w))
    shifted[:,:] = np.nan
    for idx in range(w-1):
        shifted[idx:-w+idx+1, idx] = x
    shifted[idx+1:, idx+1] = x
    # print(shifted)
    medians = np.median(shifted, axis=1)
    for idx in range(w-1):
        medians[idx] = np.median(shifted[idx, :idx+1])
        medians[-idx-1] = np.median(shifted[-idx-1, -idx-1:])
    return medians[(w-1)//2:-(w-1)//2]

moving_median(np.arange(10), 4)
# Output
array([0.5, 1. , 1.5, 2.5, 3.5, 4.5, 5.5, 6.5, 7.5, 8. ])

The output has the same length as the input vector.
Rows with less than one entry will be ignored and with half of them nans (happens only for an even window-width), only the first option will be returned. Here is the shifted_matrix from above with the respective median values:

[[ 0. nan nan nan] -> -
 [ 1.  0. nan nan] -> 0.5
 [ 2.  1.  0. nan] -> 1.0
 [ 3.  2.  1.  0.] -> 1.5
 [ 4.  3.  2.  1.] -> 2.5
 [ 5.  4.  3.  2.] -> 3.5
 [ 6.  5.  4.  3.] -> 4.5
 [ 7.  6.  5.  4.] -> 5.5
 [ 8.  7.  6.  5.] -> 6.5
 [ 9.  8.  7.  6.] -> 7.5
 [nan  9.  8.  7.] -> 8.0
 [nan nan  9.  8.] -> -
 [nan nan nan  9.]]-> -

The behaviour can be changed by adapting the final slice medians[(w-1)//2:-(w-1)//2].

Benchmark:

%%timeit
moving_median(np.arange(1000), 4)
# 267 µs ± 759 ns per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

Alternative approach: (the results will be shifted)

def moving_median_list(x, w):
    medians = np.zeros(len(x))
    for j in range(len(x)):
        medians[j] = np.median(x[j:j+w])
    return medians

%%timeit
moving_median_list(np.arange(1000), 4)
# 15.7 ms ± 115 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Both algorithms have a linear time complexity.
Therefore, the function moving_median will be the faster option.

回复收藏 0 原文

~没有更多了~