MATLAB 中的 3 天滚动相关性计算

发布于 2024-11-15 00:45:55 字数 1144 浏览 5 评论 0原文

我需要计算 3 天相关性。下面给出了示例矩阵。我的问题是 ID 可能不会每天都存在于宇宙中。例如,AAPL 可能永远存在于宇宙中,但一家公司 - CCL 可能只在我的宇宙中存在 2 天。我希望有一个矢量化的解决方案。我可能必须在这里使用 structs/accumarray 等,因为相关矩阵的大小可能会有所不同。

% col1 = tradingDates, col2 = companyID_asInts, col3 = VALUE_forCorrelation

rawdata = [ ...

    734614 1 0.5; 
    734614 2 0.4; 
    734614 3 0.1; 

    734615 1 0.6; 
    734615 2 0.4; 
    734615 3 0.2; 
    734615 4 0.5; 
    734615 5 0.12;

    734618 1 0.11; 
    734618 2 0.9; 
    734618 3 0.2; 
    734618 4 0.1; 
    734618 5 0.33;
    734618 6 0.55; 

    734619 2 0.11; 
    734619 3 0.45; 
    734619 4 0.1; 
    734619 5 0.6; 
    734619 6 0.5;

    734620 5 0.1; 
    734620 6 0.3] ; 

“3 天相关性”:

% 734614 & 734615 corr is ignored as this is a 3-day corr

% 734618_corr = corrcoef(IDs 1,2,3 values are used. ID 4,5,6 is ignored) -> 3X3 matrix

% 734619_corr = corrcoef(IDs 2,3,4,5 values are used. ID 1,6 is ignored) -> 3X4 matrix

% 734620_corr = corrcoef(IDs 5,6 values are used. ID 1,2,3,4 is ignored) -> 3X2 matrix

真实数据涵盖 1995 年至 2011 年的 Russel1000 宇宙,拥有超过 410 万行。所需的相关性超过 20 天。

I need to calculate 3-day correlation. A sample matrix is given below. My problem is that IDs may not be in the universe every day. For example, AAPL may always be in universe but a company - CCL may be in my universe for just 2 days. I would appreciate a vectorized solution. I might have to use structs/accumarray etc. here as the correlation-matrix size may vary.

% col1 = tradingDates, col2 = companyID_asInts, col3 = VALUE_forCorrelation

rawdata = [ ...

    734614 1 0.5; 
    734614 2 0.4; 
    734614 3 0.1; 

    734615 1 0.6; 
    734615 2 0.4; 
    734615 3 0.2; 
    734615 4 0.5; 
    734615 5 0.12;

    734618 1 0.11; 
    734618 2 0.9; 
    734618 3 0.2; 
    734618 4 0.1; 
    734618 5 0.33;
    734618 6 0.55; 

    734619 2 0.11; 
    734619 3 0.45; 
    734619 4 0.1; 
    734619 5 0.6; 
    734619 6 0.5;

    734620 5 0.1; 
    734620 6 0.3] ; 

'3-day correlation':

% 734614 & 734615 corr is ignored as this is a 3-day corr

% 734618_corr = corrcoef(IDs 1,2,3 values are used. ID 4,5,6 is ignored) -> 3X3 matrix

% 734619_corr = corrcoef(IDs 2,3,4,5 values are used. ID 1,6 is ignored) -> 3X4 matrix

% 734620_corr = corrcoef(IDs 5,6 values are used. ID 1,2,3,4 is ignored) -> 3X2 matrix

Real data covers Russel1000 universe from 1995-2011 and has over 4.1 million rows. The desired correlation is over a 20-day period.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

自找没趣 2024-11-22 00:45:55

我不会尝试在这里获得矢量化解决方案: MATLAB JIT 编译器< /a> 意味着循环在最新版本的 MATLAB 上通常可以同样快。

您的矩阵看起来很像稀疏矩阵:是否有助于将其转换为这种形式,以便您可以使用数组索引?这可能仅在第三列中的数据永远不能为 0 时才有效,否则您必须保留当前的显式列表并使用如下所示的内容:

dates = unique(rawdata(:, 1));
num_comps = max(rawdata(:, 2));

for d = 1:length(dates) - 2;
    days = dates(d:d + 2);

    companies = true(1, num_comps);
    for curr_day = days'
        c = false(1, num_comps);
        c(rawdata(rawdata(:, 1) == curr_day, 2)) = true;
        companies = companies & c;
    end
    companies = find(companies);

    data = zeros(3, length(companies));
    for curr_day = 1:3
        for company = 1:length(companies)
            data(curr_day, company) = ...
                rawdata(rawdata(:, 1) == days(curr_day) & ...
                        rawdata(:, 2) == companies(company), 3);
        end
    end

    corrcoef(data)
end

I wouldn't try and get a vectorized solution here: the MATLAB JIT compiler means that loops can often be just as fast on recent versions of MATLAB.

Your matrix looks a lot like a sparse matrix: does it help to convert it into that form, so that you can use array indexing? This probably only works if the data in the third column can never be 0, otherwise you'll have to keep the current explicit list and use something like this:

dates = unique(rawdata(:, 1));
num_comps = max(rawdata(:, 2));

for d = 1:length(dates) - 2;
    days = dates(d:d + 2);

    companies = true(1, num_comps);
    for curr_day = days'
        c = false(1, num_comps);
        c(rawdata(rawdata(:, 1) == curr_day, 2)) = true;
        companies = companies & c;
    end
    companies = find(companies);

    data = zeros(3, length(companies));
    for curr_day = 1:3
        for company = 1:length(companies)
            data(curr_day, company) = ...
                rawdata(rawdata(:, 1) == days(curr_day) & ...
                        rawdata(:, 2) == companies(company), 3);
        end
    end

    corrcoef(data)
end
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文