MATLAB:访问加载的 MAT 文件非常慢
我目前正在开发一个涉及保存/加载相当大的 MAT 文件(大约 150 MB)的项目,我意识到访问加载的元胞数组比访问脚本或函数内创建的等效版本要慢得多。
我创建了这个示例来模拟我的代码并显示差异:
clear; clc;
disp('Test for computing with loading');
if exist('data.mat', 'file')
delete('data.mat');
end
n_tests = 10000;
data = {};
for i=1:n_tests
data{end+1} = rand(1, 4096);
end
% disp('Saving data');
% save('data.mat', 'data');
% clear('data');
%
% disp('Loading data');
% load('data.mat', '-mat');
for i=1:n_tests
tic;
for j=1:n_tests
d = sum((data{i} - data{j}) .^ 2);
end
time = toc;
disp(['#' num2str(i) ' computed in ' num2str(time) ' s']);
end
在此代码中,没有保存或加载 MAT 文件。 i 上一次迭代的平均时间为 0.75 秒。当我取消注释行以保存/加载文件时,i 上的一次迭代的计算大约需要 6.2 秒(不考虑保存/加载时间)。差别是慢 8 倍!
我在 Windows 7 64 位上使用 MATLAB 7.12.0 (R2011a) 64 位,并且 MAT 文件使用版本 v7.3 保存。
会不会和MAT文件的压缩有关?或者缓存变量? 有什么办法可以预防/避免这种情况吗?
I'm currently working on a project involving saving/loading quite big MAT files (around 150 MB), and I realized that it was much slower to access a loaded cell array than the equivalent version created inside a script or a function.
I created this example to simulate my code and show the difference :
clear; clc;
disp('Test for computing with loading');
if exist('data.mat', 'file')
delete('data.mat');
end
n_tests = 10000;
data = {};
for i=1:n_tests
data{end+1} = rand(1, 4096);
end
% disp('Saving data');
% save('data.mat', 'data');
% clear('data');
%
% disp('Loading data');
% load('data.mat', '-mat');
for i=1:n_tests
tic;
for j=1:n_tests
d = sum((data{i} - data{j}) .^ 2);
end
time = toc;
disp(['#' num2str(i) ' computed in ' num2str(time) ' s']);
end
In this code, no MAT file is saved nor loaded. The average time for one iteration over i is 0.75s. When I uncomment the lines to save/load the file, the computation for one iteration over i takes about 6.2s (the saving/loading time is not taking into consideration). The difference is 8x slower !
I'm using MATLAB 7.12.0 (R2011a) 64 bits with Windows 7 64 bits, and the MAT files are saved with the version v7.3.
Can it be related to the compression of the MAT file? Or caching variables ?
Is there any way to prevent/avoid this ?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
我也知道这个问题。我认为这也与 matlab 中内存管理效率低下有关 - 而且我记得它在交换方面表现不佳。
一个 150MB 的文件可以轻松容纳大量数据 - 可能比快速分配的数据还要多。
我刚刚使用 信息对您的示例进行了快速计算数学工作
在您的情况下,
total_size = n_tests*121 + n_tests*(1*4096* 8)
约为 313MB。首先,我建议将它们保存为格式 7(而不是 7.3)——我注意到读取这种新格式的性能非常差。仅此一点就可能是您速度放缓的原因。
就我个人而言,我通过两种方式解决了这个问题:
I also know this problem. I think it's also related to the inefficient managing of memory in matlab - and as I remember it's not doing well with swapping.
A 150MB file can easily hold a lot of data - maybe more than can be quickly allocated.
I just made a quick calculation for your example using the information by mathworks
In your case
total_size = n_tests*121 + n_tests*(1*4096* 8)
is about 313MB.First I would suggest to save them in format 7 (instead of 7.3) - I noticed very poor performance in reading this new format. That alone could be the reason of your slowdown.
Personally I solved this in two ways:
我使用 Windows 64 位、matlab 64 位 2014b 测试此代码。
不保存和加载的情况下,计算时间约为0.22s,
用'-v7'保存数据文件然后加载,计算时间约为0.2s。
使用“-v7.3”保存数据文件然后加载,计算时间约为4.1s。
所以和MAT文件的压缩有关。
I test this code with Windows 64bit, matlab 64bit 2014b.
Without saving and loading, the computation is around 0.22s,
Save the data file with '-v7' and then load, the computation is around 0.2s.
Save the data file with '-v7.3' and then load, the computation is around 4.1s.
So it is related to the compression of the MAT file.