分桶算法

发布于 2024-07-20 07:29:21 字数 762 浏览 6 评论 0原文

我有一些可以工作的代码,但有点瓶颈,而且我一直在试图找出如何加快速度。 它处于循环中,我不知道如何对其进行矢量化。

我有一个二维数组 vals,它表示时间序列数据。 行是日期,列是不同的系列。 我试图按月对数据进行存储,以对其执行各种操作(总和、平均值等)。 这是我当前的代码:

allDts; %Dates/times for vals.  Size is [size(vals, 1), 1]
vals;
[Y M] = datevec(allDts);
fomDates = unique(datenum(Y, M, 1)); %first of the month dates

[Y M] = datevec(fomDates);
nextFomDates = datenum(Y, M, DateUtil.monthLength(Y, M)+1);

newVals = nan(length(fomDates), size(vals, 2)); %preallocate for speed

for k = 1:length(fomDates);

下一行是瓶颈,因为我调用它很多次。(循环)

    idx = (allDts >= fomDates(k)) & (allDts < nextFomDates(k));
    bucketed = vals(idx, :);
    newVals(k, :) = nansum(bucketed);
end %for

有什么想法吗? 提前致谢。

I've got some code that works, but is a bit of a bottleneck, and I'm stuck trying to figure out how to speed it up. It's in a loop, and I can't figure how to vectorize it.

I've got a 2D array, vals, that represents timeseries data. Rows are dates, columns are different series. I'm trying to bucket the data by months to perform various operations on it (sum, mean, etc). Here is my current code:

allDts; %Dates/times for vals.  Size is [size(vals, 1), 1]
vals;
[Y M] = datevec(allDts);
fomDates = unique(datenum(Y, M, 1)); %first of the month dates

[Y M] = datevec(fomDates);
nextFomDates = datenum(Y, M, DateUtil.monthLength(Y, M)+1);

newVals = nan(length(fomDates), size(vals, 2)); %preallocate for speed

for k = 1:length(fomDates);

This next line is the bottleneck because I call it so many times.(looping)

    idx = (allDts >= fomDates(k)) & (allDts < nextFomDates(k));
    bucketed = vals(idx, :);
    newVals(k, :) = nansum(bucketed);
end %for

Any Ideas? Thanks in advance.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

哆兒滾 2024-07-27 07:29:22

这是一个很难矢量化的问题。 我可以建议一种使用 CELLFUN 的方法,但我不能保证它会更快地解决您的问题(您必须根据您正在使用的特定数据集自行计时)。 如 另一个SO问题,矢量化并不总是比for循环运行得更快。 它可能是针对特定问题的,这是最好的选择。 根据该免责声明,我将建议您尝试两种解决方案:CELLFUN 版本和可能运行得更快的 for 循环版本的修改版。

CELLFUN 解决方案:

[Y,M] = datevec(allDts);
monthStart = datenum(Y,M,1);  % Start date of each month
[monthStart,sortIndex] = sort(monthStart);  % Sort the start dates
[uniqueStarts,uniqueIndex] = unique(monthStart);  % Get unique start dates

valCell = mat2cell(vals(sortIndex,:),diff([0 uniqueIndex]));
newVals = cellfun(@nansum,valCell,'UniformOutput',false);

调用 MAT2CELL 将具有相同开始日期的 vals 行分组到元胞数组 valCell 的单元格中。 变量 newVals 将是长度为 numel(uniqueStarts) 的元胞数组,其中每个元胞将包含对相应元胞执行 nansum 的结果valCell

FOR循环解决方案:

[Y,M] = datevec(allDts);
monthStart = datenum(Y,M,1);  % Start date of each month
[monthStart,sortIndex] = sort(monthStart);  % Sort the start dates
[uniqueStarts,uniqueIndex] = unique(monthStart);  % Get unique start dates

vals = vals(sortIndex,:);  % Sort the values according to start date
nMonths = numel(uniqueStarts);
uniqueIndex = [0 uniqueIndex];
newVals = nan(nMonths,size(vals,2));  % Preallocate
for iMonth = 1:nMonths,
  index = (uniqueIndex(iMonth)+1):uniqueIndex(iMonth+1);
  newVals(iMonth,:) = nansum(vals(index,:));
end

That's a difficult problem to vectorize. I can suggest a way to do it using CELLFUN, but I can't guarantee that it will be faster for your problem (you would have to time it yourself on the specific data sets you are using). As discussed in this other SO question, vectorizing doesn't always work faster than for loops. It can be very problem-specific which is the best option. With that disclaimer, I'll suggest two solutions for you to try: a CELLFUN version and a modification of your for-loop version that may run faster.

CELLFUN SOLUTION:

[Y,M] = datevec(allDts);
monthStart = datenum(Y,M,1);  % Start date of each month
[monthStart,sortIndex] = sort(monthStart);  % Sort the start dates
[uniqueStarts,uniqueIndex] = unique(monthStart);  % Get unique start dates

valCell = mat2cell(vals(sortIndex,:),diff([0 uniqueIndex]));
newVals = cellfun(@nansum,valCell,'UniformOutput',false);

The call to MAT2CELL groups the rows of vals that have the same start date together into cells of a cell array valCell. The variable newVals will be a cell array of length numel(uniqueStarts), where each cell will contain the result of performing nansum on the corresponding cell of valCell.

FOR-LOOP SOLUTION:

[Y,M] = datevec(allDts);
monthStart = datenum(Y,M,1);  % Start date of each month
[monthStart,sortIndex] = sort(monthStart);  % Sort the start dates
[uniqueStarts,uniqueIndex] = unique(monthStart);  % Get unique start dates

vals = vals(sortIndex,:);  % Sort the values according to start date
nMonths = numel(uniqueStarts);
uniqueIndex = [0 uniqueIndex];
newVals = nan(nMonths,size(vals,2));  % Preallocate
for iMonth = 1:nMonths,
  index = (uniqueIndex(iMonth)+1):uniqueIndex(iMonth+1);
  newVals(iMonth,:) = nansum(vals(index,:));
end
夜空下最亮的亮点 2024-07-27 07:29:22

如果您需要做的就是对矩阵的行求和或平均值,其中行根据另一个变量(日期)求和,然后使用我的合并器函数。 它的设计正是为了执行此操作,根据指标系列的值减少数据。 (实际上,consolidator 也可以处理 nd 数据,并且具有一定的容差,但您所需要做的就是向其传递月份和年份信息。)

在 Matlab Central 上的文件交换上查找合并器

If all you need to do is form the sum or mean on rows of a matrix, where the rows are summed depending upon another variable (date) then use my consolidator function. It is designed to do exactly this operation, reducing data based on the values of an indicator series. (Actually, consolidator can also work on n-d data, and with a tolerance, but all you need to do is pass it the month and year information.)

Find consolidator on the file exchange on Matlab Central

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文