分桶算法
我有一些可以工作的代码,但有点瓶颈,而且我一直在试图找出如何加快速度。 它处于循环中,我不知道如何对其进行矢量化。
我有一个二维数组 vals,它表示时间序列数据。 行是日期,列是不同的系列。 我试图按月对数据进行存储,以对其执行各种操作(总和、平均值等)。 这是我当前的代码:
allDts; %Dates/times for vals. Size is [size(vals, 1), 1]
vals;
[Y M] = datevec(allDts);
fomDates = unique(datenum(Y, M, 1)); %first of the month dates
[Y M] = datevec(fomDates);
nextFomDates = datenum(Y, M, DateUtil.monthLength(Y, M)+1);
newVals = nan(length(fomDates), size(vals, 2)); %preallocate for speed
for k = 1:length(fomDates);
下一行是瓶颈,因为我调用它很多次。(循环)
idx = (allDts >= fomDates(k)) & (allDts < nextFomDates(k));
bucketed = vals(idx, :);
newVals(k, :) = nansum(bucketed);
end %for
有什么想法吗? 提前致谢。
I've got some code that works, but is a bit of a bottleneck, and I'm stuck trying to figure out how to speed it up. It's in a loop, and I can't figure how to vectorize it.
I've got a 2D array, vals, that represents timeseries data. Rows are dates, columns are different series. I'm trying to bucket the data by months to perform various operations on it (sum, mean, etc). Here is my current code:
allDts; %Dates/times for vals. Size is [size(vals, 1), 1]
vals;
[Y M] = datevec(allDts);
fomDates = unique(datenum(Y, M, 1)); %first of the month dates
[Y M] = datevec(fomDates);
nextFomDates = datenum(Y, M, DateUtil.monthLength(Y, M)+1);
newVals = nan(length(fomDates), size(vals, 2)); %preallocate for speed
for k = 1:length(fomDates);
This next line is the bottleneck because I call it so many times.(looping)
idx = (allDts >= fomDates(k)) & (allDts < nextFomDates(k));
bucketed = vals(idx, :);
newVals(k, :) = nansum(bucketed);
end %for
Any Ideas? Thanks in advance.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
这是一个很难矢量化的问题。 我可以建议一种使用 CELLFUN 的方法,但我不能保证它会更快地解决您的问题(您必须根据您正在使用的特定数据集自行计时)。 如 另一个SO问题,矢量化并不总是比for循环运行得更快。 它可能是针对特定问题的,这是最好的选择。 根据该免责声明,我将建议您尝试两种解决方案:CELLFUN 版本和可能运行得更快的 for 循环版本的修改版。
CELLFUN 解决方案:
调用 MAT2CELL 将具有相同开始日期的 vals 行分组到元胞数组 valCell 的单元格中。 变量 newVals 将是长度为 numel(uniqueStarts) 的元胞数组,其中每个元胞将包含对相应元胞执行 nansum 的结果valCell。
FOR循环解决方案:
That's a difficult problem to vectorize. I can suggest a way to do it using CELLFUN, but I can't guarantee that it will be faster for your problem (you would have to time it yourself on the specific data sets you are using). As discussed in this other SO question, vectorizing doesn't always work faster than for loops. It can be very problem-specific which is the best option. With that disclaimer, I'll suggest two solutions for you to try: a CELLFUN version and a modification of your for-loop version that may run faster.
CELLFUN SOLUTION:
The call to MAT2CELL groups the rows of vals that have the same start date together into cells of a cell array valCell. The variable newVals will be a cell array of length numel(uniqueStarts), where each cell will contain the result of performing nansum on the corresponding cell of valCell.
FOR-LOOP SOLUTION:
如果您需要做的就是对矩阵的行求和或平均值,其中行根据另一个变量(日期)求和,然后使用我的合并器函数。 它的设计正是为了执行此操作,根据指标系列的值减少数据。 (实际上,consolidator 也可以处理 nd 数据,并且具有一定的容差,但您所需要做的就是向其传递月份和年份信息。)
在 Matlab Central 上的文件交换上查找合并器
If all you need to do is form the sum or mean on rows of a matrix, where the rows are summed depending upon another variable (date) then use my consolidator function. It is designed to do exactly this operation, reducing data based on the values of an indicator series. (Actually, consolidator can also work on n-d data, and with a tolerance, but all you need to do is pass it the month and year information.)
Find consolidator on the file exchange on Matlab Central