C# 中的多维相关性

发布于 2024-12-27 08:11:40 字数 1721 浏览 2 评论 0原文

我得到了两个 N 维系列的点，每个点的长度为 M.. 目标是将它们关联起来并计算相关系数.. 使用方差、协方差和标准差的公式，可以计算相关系数..

什么我不明白如何调整算法来考虑所有 N 个维度而不是仅一个维度..考虑以下内容..

series A = [0, 0] [1, 1] [2, 2] [3, 3]  
series B = [0,  0] [1, -1] [2, -2] [3, -3]

如果我们仅使用第一个维度进行相关性，我们将得到 +1.00..如果我们使用第二个维度，我们会得到-1.00..但是我们可以看到，如果我们考虑两个维度的相关性，答案不会像+1.00或-1.00那么简单..

所以我想知道如何制定这种多维相关性，最好是用 C# 编写..

请随时要求进一步说明或编辑以进一步改进帖子.. =)

编辑：我使用的系列是股票时间系列.. 我检索收盘价格的最新 M 个样本作为 A 系列和开始将其与所有历史数据作为滑动窗口关联起来（数据[1]到数据[M+1]，数据[2]到数据[M+2]，数据[1000]到数据[M+1000]，等等on)..相关性最高的偏移量是价格行为与现在几乎相同的时间点..通过分析该时间点之后价格是否上涨或下跌，我们可以预测价格的走向可能会在这个时候立即采取行动..但我不只使用收盘价格（一维）..我想识别许多指标相似的区域，例如收盘价、成交量等..因此时间序列每个索引不仅有一个值，而是有一个值整个值数组..

如果我在相关中仅使用 CLOSE，我无法保证这些系列的 VOLUME 序列是否也相似..同样，如果我在相关中使用 VOLUME，我无法保证这些系列也将相似..所以我需要一个基于某种距离度量的归一化相关性公式..类似于 a^2 + b^2..如果 CLOSE 值相似，则 a^2 将是小..如果 VOLUME 值相似，b^2 会很小..现在如果 a^2 + b^2 很小，则意味着 CLOSE 和 VOLUME 相似..

以前我所做的如下：< br> 1. 使用收盘价计算相关性。
2.使用VOLUME计算相关性。
3. 将这些值相乘。这将确保高相关值意味着 CLOSE 和 VOLUME 具有很强的个体相关性。

编辑：

stdDevX = Sqrt (Summation ((x - Mean(x)) * (x - Mean(x)) / N)
stdDevY = Sqrt (Summation ((y - Mean(y)) * (y - Mean(y)) / N)  
corrXY = Summation ((x - Mean(x) * (y - Mean(y)) / (stdDevX * stdDevY)) / (N - 1)

http://en.wikipedia.org/wiki/Standard_deviation
http://en.wikipedia.org/wiki/Pearson_product-moment_correlation_coefficient

上述公式假设系列 x 和 y 都是一维的。我主要关心的是如何使这些公式适用于多维向量。我希望用它来查找所有价格指标所在的区域在历史上是相似的..但是任何希望关联任何类型的向量的人都可以使用它..对象的x，y，z坐标等..

原文

ive got two N-dimensional series of points, each of length M.. the objective is to correlate them and calculate the correlation coefficient.. using formulae for variance, covariance and standard deviation, it is possible to calculate the correlation coefficient..

what i dont understand is how to adapt the algorithm to account for all N dimensions instead of just one.. consider the following..

series A = [0, 0] [1, 1] [2, 2] [3, 3]  
series B = [0,  0] [1, -1] [2, -2] [3, -3]

if we use only the first dimension for correlation, we'll get +1.00.. if we use the second, we'll get -1.00.. but we can see that if we were to consider both dimensions for correlation, the answer won't be as simple as +1.00 or -1.00..

so i wanna know how to formulate this sort of multiple-dimension correlation, preferably in c#..

feel free to ask for further clarifications or edit to improve the post further.. =)

EDIT: the series im using are stock time series.. i retrieve the latest M samples of CLOSE prices as series A and start correlating it with all historic data as a sliding window (data[1] to data[M+1], data[2] to data[M+2], data[1000] to data[M+1000], and so on).. the offset where the correlation is highest is the point in time where the price behaviour was almost identical to now.. by analyzing if the price moved up or down after that time instance, we can make a prediction which way the price might make a move at this time instant.. but im not using just CLOSE prices (1-dimension).. i want to identify regions where a number of metrics were similar, for instance CLOSE, VOLUME, etc.. so the time series doesnt have just one value for every index but a whole array of values..

if i use just CLOSE in correlation, i cant guarantee if the VOLUME sequence of these series will be similar too.. likewise if i use VOLUME in correlation, i cant guarantee if the CLOSE sequence of these series will be similar too.. so i need a formula for normalized correlation which is based on some sort of distance metric.. something like a^2 + b^2.. if the CLOSE values are similar, a^2 will be small.. if the VOLUME values are similar, b^2 will be small.. now if a^2 + b^2 is small, it means both CLOSE and VOLUME are similar..

previously what i was doing was as follows:
1. use CLOSE prices to calculate correlation.
2. use VOLUME to calculate correlation.
3. multiply these values together.. this will ensure that high correlation values will imply that both CLOSE and VOLUME have strong individual correlations..

EDIT:

stdDevX = Sqrt (Summation ((x - Mean(x)) * (x - Mean(x)) / N)
stdDevY = Sqrt (Summation ((y - Mean(y)) * (y - Mean(y)) / N)  
corrXY = Summation ((x - Mean(x) * (y - Mean(y)) / (stdDevX * stdDevY)) / (N - 1)

http://en.wikipedia.org/wiki/Standard_deviation
http://en.wikipedia.org/wiki/Pearson_product-moment_correlation_coefficient

the above formulae assume that both series x and y are one-dimensional.. my main concern is how to adapt these formulae for multi-dimensional vectors.. i wish to use it to find regions where all price metrics are similar in history.. but it can be used by anyone who wishes to correlate any sort of vertors.. x,y,z coordinates of an object, etc..

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

虐人心 2025-01-03 08:11:40

从问题中尚不清楚，但我认为您被要求分别对待每个系列。因此，将 A 系列视为来自一对变量 X 和 Y 的样本序列，这两个变量完全相关（如果您绘制散点图，所有值都将位于一条直线上）从左下到右上角），因此相关性为+1。

相比之下，将 B 系列视为来自 X 和 Y 的另一个样本序列，这次散点图将再次是从左上角到右下角的直线。增加 X 会减少 Y。相关性为 -1。

如果每个系列包含来自三个变量的样本（例如，三支股票随时间变化的价格快照），事情会变得更有趣。这是一个简单的例子：

            X  Y  Z   X  Y   Z   X  Y   Z   X  Y   Z
series C = [0, 0, 0] [1, 1, -1] [2, 2, -2] [3, 3, -3]

在这里，您需要考虑每对变量之间的相关性。在这个简单的例子中，X 和 Y 之间的相关性是 +1，X 和 Z 之间的相关性是 -1 Y 和 Z 之间为 -1。

编辑：组合相关性

假设您有两个时间段内三个变量的样本 - 收盘、高和低，并且想知道匹配两个时期。您可以以传统方式计算每个变量的两个时间段之间的相关性。假设得出紧密相关 = 0.6，高相关 = 0.3，低相关 = 0.4。

您需要某种方法将各个相关性组合成拟合优度分数，使得远离零的各个相关性（即高度相关，无论是正相关还是负相关）对分数的贡献大于那些接近零的相关性。简单的方法包括取乘积 (0.6 * 0.3 * 0.4 = 0.072) 或均方根 (sqrt((0.6^2 + 0.3^2 + 0.4^2) / 3) = 0.4509) – 你必须进行实验以找到能够为您提供最可靠结果的方法。

It's not clear from the question, but I think you are being asked to treat each series separately. So considering just Series A as a sequence of samples from a pair of variables X and Y, the two variables are completely tied (if you drew a scatter plot, all the values would be on a straight line from bottom-left to top-right) so the correlation is +1.

In contrast, considering just Series B as another sequence of samples from X and Y, this time a scatter plot would again be a straight line from top-left to bottom right. Increasing X decreases Y. The correlation is -1.

It gets more interesting if each series contains samples from three variables (for example, snapshots of the prices of three stocks over time). Here is a simple example:

            X  Y  Z   X  Y   Z   X  Y   Z   X  Y   Z
series C = [0, 0, 0] [1, 1, -1] [2, 2, -2] [3, 3, -3]

Here, you need to consider the correlation between each pair of variables. In this simple case, the correlation between X and Y is +1, between X and Z is -1 and between Y and Z is -1.

Edit: Combining correlations

Suppose you have samples from three variables – close, high and low – for two time periods and want to know how good a match the two periods are. You could calculate the correlations between the two time periods for each variable in the traditional way. Suppose this yields close-correlation = 0.6, high-correlation = 0.3, and low-correlation = 0.4.

You need some method of combining the individual correlations into a goodness of fit score in such a way that individual correlations far from zero (i.e. highly correlated, either positively or negatively) have a bigger contribution to the score than those close to zero. Simple approaches include taking the product (0.6 * 0.3 * 0.4 = 0.072) or the root-mean-square (sqrt((0.6^2 + 0.3^2 + 0.4^2) / 3) = 0.4509) – you'll have to experiment to find the method that gives you the most reliable results.

回复收藏 0 原文

瀟灑尐姊 2025-01-03 08:11:40

int GetCorrelationScore(Array[,] seriesA, Array[,] seriesB)
{
   int correlationScore = 0;

   for (var i = 0, i < seriesA.Length; i++)
   {
      if (areEqual(seriesA[i][0], seriasB[i][0], 0.5m) && areEqual(seriesA[i][1], seriasB[i][1], 0.5m))
         correlationScore++;
      else
         correlationScore--;
   }
}

bool areEqual(decimal value1, decimal value2, decimal allowedVariance)
{
   var lowValue1 = value1 - allowedVariance;
   var highValue1 = value1 + allowedVariance;

   return (lowValue1 < value2 && highValue1 > value2)
}

int GetCorrelationScore(Array[,] seriesA, Array[,] seriesB)
{
   int correlationScore = 0;

   for (var i = 0, i < seriesA.Length; i++)
   {
      if (areEqual(seriesA[i][0], seriasB[i][0], 0.5m) && areEqual(seriesA[i][1], seriasB[i][1], 0.5m))
         correlationScore++;
      else
         correlationScore--;
   }
}

bool areEqual(decimal value1, decimal value2, decimal allowedVariance)
{
   var lowValue1 = value1 - allowedVariance;
   var highValue1 = value1 + allowedVariance;

   return (lowValue1 < value2 && highValue1 > value2)
}

回复收藏 0 原文

~没有更多了~