使用 FFT 来近似聚合损失随机变量的 CDF
下面你将找到几周前给我的一个班级作业的 python 代码,但我一直无法成功调试。问题是使用 FFT 查找总损失随机变量的风险值(即 p% 分位数)。我们得到了一个清晰的数学过程,通过它我们可以获得总损失随机变量的离散 CDF 的估计。然而,我的结果严重偏离,并且我犯了某种错误,即使经过数小时的调试代码后我也无法找到该错误。
给出聚合损失随机变量 S
,使得 S=sum(X_i for i in range(N))
,其中 N
为负二项式分布为 r=5, beta=.2
,X_i
分布为 theta=1
指数分布。此参数化的概率生成函数为 P(z)=[1-\beta(z-1)]^{-r}。
来近似 S
的分布
- 我们被要求通过选择网格宽度
h
和整数n
,使得r=2^n
是要离散化X
的元素数量, - 离散化
X
并计算位于宽度为h
的等距间隔中的概率, - 将 FFT 应用于离散化
X
, - 将
N
的 PGF 应用于傅立叶变换X
的元素, - 并将逆 FFT 应用于该向量。
所得向量应该是 S
每个此类区间的概率质量的近似值。我从以前的方法中知道,95% VaR 应该约为 4,99.9% VaR 应该约为 10。但我的代码返回了无意义的结果。一般来说,我的 ECDF 达到水平 > 0.95 的索引已经太晚了,即使经过几个小时的调试,我也没有找到出错的地方。
我还在数学堆栈交换上问过这个问题,因为这个问题很大程度上是关于编程和数学的交叉点,我目前不知道问题是否出在事物的实现方面,或者我是否正在应用数学思想错误的。
import numpy as np
from scipy.stats import expon
from scipy.fft import fft, ifft
r, beta, theta = 5, .2, 1
var_levels = [.95, .999]
def discretize_X(h: float, m: int):
X = expon(scale=theta)
f_X = [X.cdf(h / 2),
*[X.cdf(j * h + h / 2) - X.cdf(j * h - h / 2) for j in range(1, m - 1)],
X.sf((m - 1) * h - h / 2)]
return f_X
# Probability generating function of N ~ NB(r, beta)
def PGF(z: [float, complex]):
return (1 - beta * (z - 1)) ** (-r)
h = 1e-2
n = 10
r = 2 ** n
VaRs, TVaRs = [], []
# discretize X with (r-1) cells of width h and one final cell with the survival function at h*(r-1)
f_X = discretize_X(h, r)
phi_vec = fft(f_X)
f_tilde_vec_fft = np.array([PGF(phi) for phi in phi_vec])
f_S = np.real(ifft(f_tilde_vec_fft))
ecdf_S = np.cumsum(f_S) # calc cumsum to get ECDF
for p in var_levels:
var_idx = np.where(ecdf_S >= p)[0][0] # get lowest index where ecdf_S >= p
print("p =", p, "\nVaR idx:", var_idx)
var = h * var_idx # VaR should be this index times the cell width
print("VaR:", var)
tvar = 1 / (1 - p) * np.sum(f_S[var_idx:] * np.array([i * h for i in range(var_idx, r)])) # TVaR should be each cell's probability times the value inside that cell
VaRs.append(var)
TVaRs.append(tvar)
return VaRs, TVaRs
Below you will find my python code for a class assignment I was given a couple weeks ago which I have been unable to successfully debug. The problem is about finding the value at risk (i.e., the p% quantile) for an aggregate loss random variable, using FFT. We are given a clear mathematical procedure by which we can gain an estimation of the discretized CDF of the aggregate loss random variable. My results are, however, seriously off and I am making some kind of mistake which I have been unable to find even after hours of debugging my code.
The aggregate loss random variable S
is given such that S=sum(X_i for i in range(N))
, where N
is negative binomially distributed with r=5, beta=.2
, and X_i
is exponentially distributed with theta=1
. The probability generating function for this parametrization is P(z)=[1-\beta(z-1)]^{-r}
.
We were asked to approximate the distribution of S
by
- choosing a grid width
h
and an integern
such thatr=2^n
is the number of elements to discretizeX
on, - discretizing
X
and calculating the probabilities of being in equally spaced intervals of widthh
, - applying the FFT to the discretized
X
, - applying the PGF of
N
to the elements of the Fourier-transformedX
, - applying the inverse FFT to this vector.
The resulting vector should be an approximation for the probability masses of each such interval for S
. I know from previous methods that the 95% VaR ought to be ~4 and the 99.9% VaR ought to be ~10. But my code returns nonsensical results. Generally speaking, my index where the ECDF reaches levels >0.95 is way too late, and even after hours of debugging I have not managed to find where I am going wrong.
I have also asked this question on the math stackexchange, since this question is very much on the intersection of programming and math and I have no idea at this moment whether the issue is on the implementation side of things or whether I am applying the mathematical ideas wrong.
import numpy as np
from scipy.stats import expon
from scipy.fft import fft, ifft
r, beta, theta = 5, .2, 1
var_levels = [.95, .999]
def discretize_X(h: float, m: int):
X = expon(scale=theta)
f_X = [X.cdf(h / 2),
*[X.cdf(j * h + h / 2) - X.cdf(j * h - h / 2) for j in range(1, m - 1)],
X.sf((m - 1) * h - h / 2)]
return f_X
# Probability generating function of N ~ NB(r, beta)
def PGF(z: [float, complex]):
return (1 - beta * (z - 1)) ** (-r)
h = 1e-2
n = 10
r = 2 ** n
VaRs, TVaRs = [], []
# discretize X with (r-1) cells of width h and one final cell with the survival function at h*(r-1)
f_X = discretize_X(h, r)
phi_vec = fft(f_X)
f_tilde_vec_fft = np.array([PGF(phi) for phi in phi_vec])
f_S = np.real(ifft(f_tilde_vec_fft))
ecdf_S = np.cumsum(f_S) # calc cumsum to get ECDF
for p in var_levels:
var_idx = np.where(ecdf_S >= p)[0][0] # get lowest index where ecdf_S >= p
print("p =", p, "\nVaR idx:", var_idx)
var = h * var_idx # VaR should be this index times the cell width
print("VaR:", var)
tvar = 1 / (1 - p) * np.sum(f_S[var_idx:] * np.array([i * h for i in range(var_idx, r)])) # TVaR should be each cell's probability times the value inside that cell
VaRs.append(var)
TVaRs.append(tvar)
return VaRs, TVaRs
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
不确定数学,但在代码片段中变量
r
被覆盖,并且在计算f_tilde_vec_fft
函数PGF
时不使用5
正如r
的预期,但1024
。修复 - 在超参数定义中将名称r
更改为r_nb
:r_nb, beta, theta = 5, .2, 1
以及函数中
PGF
:return (1 - beta * (z - 1)) ** (-r_nb)
运行后其他参数保持不变(如
h,
n
等)对于VaRs
我得到[4.05, 9.06]
Not sure about math, but in snippet variable
r
gets overrided, and when computingf_tilde_vec_fft
functionPGF
uses not5
as expected forr
, but1024
. Fix -- change namer
tor_nb
in definition of hyperparameters:r_nb, beta, theta = 5, .2, 1
and also in function
PGF
:return (1 - beta * (z - 1)) ** (-r_nb)
After run with other parameters remain same (such as
h
,n
etc.) forVaRs
I get[4.05, 9.06]