蒙特卡洛延续多叉潘达时间

发布于 2025-02-13 17:19:22 字数 1219 浏览 0 评论 0原文

我在pandas数据框架中的时间表中有很多数据点。据说每列都是彼此独立的。我想创建一个Montecarlo过程，以计算每个列的预期值。为此，我的期望是基础数据遵循布朗运动模式，因此我需要在时空点之间的差异上产生正态分布。

我以这样的方式改变了我的数据：

diffs = (data.diff() / data.shift(1))

这是我目前拥有的：

data = diffs.describe()

这给出了以下输出：

           A           B           C
count   4986.000000 4963.000000 1861.000000
mean    0.000285    0.000109    0.000421
std 0.015759    0.015426    0.014676
...

我这样处理以生成更多样本的处理：

import numpy as np
desired_samples = 1000
random = np.random.default_rng().normal(loc=[data.loc[["mean"]].to_numpy()], scale=[data.loc[["std"]].to_numpy()], size=[len(data.columns), desired_samples])

但是这给了我一个错误：

ValueError: shape mismatch: objects cannot be broadcast to a single shape.  Mismatch is between arg 0 with shape (441, 1000) and arg 1 with shape (1, 1, 441).

我想要的只是一个随机的矩阵列的值与样本列具有相同的性病和平均值。即当我进行Random.Describe（）时，我会得到类似的东西：

          A           B           C
count   1000.0       1000.0     1000.0
mean    0.000285    0.000109    0.000421
std 0.015759    0.015426    0.014676
...

生成这些样本的正确方法是什么？

原文

I have a bunch of data points in a timeseries in a pandas dataframe. Each column is supposedly independent of each other. I want to create a montecarlo process to calculate expected values for each of the columns. For that, my expectation is that the underlying data follows a brownian motion pattern, so I'd need to generate a normal distribution over the differences between points in time space.

I transform my data like this:

diffs = (data.diff() / data.shift(1))

This is what I have at the moment:

data = diffs.describe()

This gives the following output:

           A           B           C
count   4986.000000 4963.000000 1861.000000
mean    0.000285    0.000109    0.000421
std 0.015759    0.015426    0.014676
...

I process it like this to generate more samples:

import numpy as np
desired_samples = 1000
random = np.random.default_rng().normal(loc=[data.loc[["mean"]].to_numpy()], scale=[data.loc[["std"]].to_numpy()], size=[len(data.columns), desired_samples])

However this gives me an error:

ValueError: shape mismatch: objects cannot be broadcast to a single shape.  Mismatch is between arg 0 with shape (441, 1000) and arg 1 with shape (1, 1, 441).

What I'd want is just a matrix of random values whose columns have the same std and mean as the sample's columns. I.e. such as when I do random.describe(), I'd get something like:

          A           B           C
count   1000.0       1000.0     1000.0
mean    0.000285    0.000109    0.000421
std 0.015759    0.015426    0.014676
...

What'd be the correct way to generate those samples?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

莫多说 2025-02-20 17:19:22

您可以使用apply（）使用相关列的属性创建随机正常值的数据框架。

生成测试数据

nv = 50
d = {'A':np.random.normal(1,1,nv),'B':np.random.normal(2,2,nv),'C':np.random.normal(3,3,nv)}
df = pd.DataFrame(d)
print(df)

           A         B         C
0   0.276252 -2.833479  5.746740
1   1.562030  1.497242  2.557416
2   0.883105 -0.861824  3.106192
3   0.352372  0.014653  4.006219
4   1.475524  3.151062 -1.392998
5   2.011649 -2.289844  4.371251
6   3.230964  3.578058  0.610422
7   0.366506  3.391327  0.812932
8   1.669673 -1.021665  4.262500
9   1.835547  4.292063  6.983015
10  1.768208  4.029970  3.971751
...
45  0.501706  0.926860  7.008008
46  1.759266 -0.215047  4.560403
47  1.899167  0.690204 -0.538415
48  1.460267  1.506934  1.306303
49  1.641662  1.066182  0.049233

df.describe()

               A          B          C
count  50.000000  50.000000  50.000000
mean    0.962083   1.522234   2.992492
std     1.073733   1.848754   2.838976

则具有相同（计算）平均值的随机值和STD

mat = df.apply(lambda x: np.random.normal(x.mean(),x.std(),100))
print(mat)
           A         B         C
0   0.234955  2.201961  1.910073
1   1.973203  3.528576  5.925673
2  -0.858201  2.234295  1.741338
3   2.245650  2.805498  0.135784
4   1.913691  2.134813  2.246989
..       ...       ...       ...
95  2.996207  2.248727  2.792658
96  0.663609  4.533541  1.518872
97  0.848259 -0.348086  2.271724
98  3.672370  1.706185 -0.862440
99  0.392051  0.832358 -0.354981

[100 rows x 3 columns]

mat.describe()
                A           B           C
count  100.000000  100.000000  100.000000
mean     0.877725    1.332039    2.673327
std      1.148153    1.749699    2.447532

如果您希望矩阵为numpy

mat.to_numpy()
array([[ 0.78881292,  3.09428714, -1.22757096],
       [ 0.13044099, -1.02564025,  2.6566989 ],
       [ 0.06090083,  1.50629474,  3.61487469],
       [ 0.71418932,  1.88441111,  5.84979454],
       [ 2.34287411,  2.58478867, -4.04433653],
       [ 1.41846256,  0.36414635,  8.47482082],
       [ 0.46765842,  1.37188986,  3.28011085],
       [ 0.87433273,  3.45735286,  1.13351138],
       [ 1.59029413,  4.0227165 ,  3.58282534],
       [ 2.23663894,  2.75007385, -0.36242541],
       [ 1.80967311,  1.29206572,  1.73277577],
       [ 1.20787923,  2.75529187,  4.64721489],
       [ 2.33466341,  6.43830387,  4.31354348],
       [ 0.87379125,  3.00658046,  4.94270155],
       etc ...

You could use apply() to create a data frame of random normal values using the attributes of the associated columns.

Generate Test Data

nv = 50
d = {'A':np.random.normal(1,1,nv),'B':np.random.normal(2,2,nv),'C':np.random.normal(3,3,nv)}
df = pd.DataFrame(d)
print(df)

           A         B         C
0   0.276252 -2.833479  5.746740
1   1.562030  1.497242  2.557416
2   0.883105 -0.861824  3.106192
3   0.352372  0.014653  4.006219
4   1.475524  3.151062 -1.392998
5   2.011649 -2.289844  4.371251
6   3.230964  3.578058  0.610422
7   0.366506  3.391327  0.812932
8   1.669673 -1.021665  4.262500
9   1.835547  4.292063  6.983015
10  1.768208  4.029970  3.971751
...
45  0.501706  0.926860  7.008008
46  1.759266 -0.215047  4.560403
47  1.899167  0.690204 -0.538415
48  1.460267  1.506934  1.306303
49  1.641662  1.066182  0.049233

df.describe()

               A          B          C
count  50.000000  50.000000  50.000000
mean    0.962083   1.522234   2.992492
std     1.073733   1.848754   2.838976

Generate Random Values with same approx (calculated) Mean and STD

mat = df.apply(lambda x: np.random.normal(x.mean(),x.std(),100))
print(mat)
           A         B         C
0   0.234955  2.201961  1.910073
1   1.973203  3.528576  5.925673
2  -0.858201  2.234295  1.741338
3   2.245650  2.805498  0.135784
4   1.913691  2.134813  2.246989
..       ...       ...       ...
95  2.996207  2.248727  2.792658
96  0.663609  4.533541  1.518872
97  0.848259 -0.348086  2.271724
98  3.672370  1.706185 -0.862440
99  0.392051  0.832358 -0.354981

[100 rows x 3 columns]

mat.describe()
                A           B           C
count  100.000000  100.000000  100.000000
mean     0.877725    1.332039    2.673327
std      1.148153    1.749699    2.447532

If you want the matrix to be numpy

mat.to_numpy()
array([[ 0.78881292,  3.09428714, -1.22757096],
       [ 0.13044099, -1.02564025,  2.6566989 ],
       [ 0.06090083,  1.50629474,  3.61487469],
       [ 0.71418932,  1.88441111,  5.84979454],
       [ 2.34287411,  2.58478867, -4.04433653],
       [ 1.41846256,  0.36414635,  8.47482082],
       [ 0.46765842,  1.37188986,  3.28011085],
       [ 0.87433273,  3.45735286,  1.13351138],
       [ 1.59029413,  4.0227165 ,  3.58282534],
       [ 2.23663894,  2.75007385, -0.36242541],
       [ 1.80967311,  1.29206572,  1.73277577],
       [ 1.20787923,  2.75529187,  4.64721489],
       [ 2.33466341,  6.43830387,  4.31354348],
       [ 0.87379125,  3.00658046,  4.94270155],
       etc ...

回复收藏 0 原文

~没有更多了~