MATLAB：将多个变量保存到“-v7.3” (HDF5).mat-files 在使用“-append”时似乎更快旗帜。怎么会？

发布于 2024-10-16 19:43:38 字数 7160 浏览 7 评论 0原文

注意： 此问题涉及 2011 年旧 MATLAB 版本 (R2009a) 中观察到的问题。根据 2016 年 7 月的更新，MATLAB 中的问题/错误似乎不再存在（使用 R2016a 进行测试；向下滚动到问题末尾以查看更新）。

我正在使用 MATLAB R2009b，我需要编写一个较大的脚本，可将较大的 .zip 文件集的内容转换为 v7.3 mat 文件（具有底层 HDF5 数据模型）。读书是可以的。问题在于保存。而且实际上也没有问题。使用保存命令可以很好地保存我的文件。

我的问题更有意义：为什么我在 MATLAB 中观察到以下令人惊讶的（对我来说）行为？

让我们总体上看一下我的问题。在当前的测试场景中，我将生成一个输出：A -v7.3 mat 文件。此 .mat 文件将包含 40 个块作为单独的变量。每个变量将被命名为从 1 到 40 的“block_NNN”，并将包含一个带有字段 frames 和 blockNo 的结构。字段frames包含一个480x240x65的uint8图像数据序列（这里只是使用randi生成的随机数据）。字段blockNo包含块号。

备注：在真实的脚本中（我尚未完成），我将总共执行上述操作 370 次，总共转换 108GB 的原始数据。这就是为什么我关心以下问题。

无论如何，首先我定义一些通用变量：

% some sizes for dummy data and loops:
num_blockCount = 40;
num_blockLength = 65;
num_frameHeight = 480;
num_frameWidth = 240;

然后生成一些形状和大小与实际原始数据相同的虚拟代码：

% generate empty struct:
stu_data2disk = struct();

% loop over blocks:
for num_k = 1:num_blockCount

   % generate block-name:
   temp_str_blockName = sprintf('block_%03u', num_k);

   % generate temp struct for current block:
   temp_stu_value = struct();
   temp_stu_value.frames = randi( ...
      [0 255], ...
      [num_frameHeight num_frameWidth num_blockLength], ...
      'uint8' ...
   );
   temp_stu_value.blockNo = num_k;

   % using dynamic field names:
   stu_data2disk.(sprintf('block_%03u', num_k)) = temp_stu_value;

end

我现在拥有所有随机测试数据在结构体 stu_data2disk 中。现在我想使用两种可能的方法之一保存数据。

让我们先尝试简单的一个：

% save data (simple):
disp('Save data the simple way:')
tic;
save converted.mat -struct stu_data2disk -v7.3;
toc;

文件写入没有问题（286MB）。输出是：

Save data the simple way:
Elapsed time is 14.004449 seconds.

好的 - 然后我想起来我想在 40 个块上遵循保存过程。因此，我不是上面那样循环遍历块并按顺序附加它们：

% save to file, using append:
disp('Save data using -append:')
tic;
for num_k = 1:num_blockCount

   % generate block-name:
   temp_str_blockName = sprintf('block_%03u', num_k);

   temp_str_appendToggle = '';
   if (num_k > 1)
      temp_str_appendToggle = '-append';
   end

   % generate save command:
   temp_str_saveCommand = [ ...
      'save ', ...
      'converted_append.mat ', ...
      '-struct stu_data2disk ', temp_str_blockName, ' '...
      temp_str_appendToggle, ' ', ...
      '-v7.3', ...
      ';' ...
   ];

   % evaluate save command:
   eval(temp_str_saveCommand);

end
toc;

文件再次保存得很好（286MB）。输出是：

Save data using -append:
Elapsed time is 0.956968 seconds.

有趣的是，追加方法要快得多？ 我的问题是为什么？

dir conversion*.mat 的输出：

09-02-2011  20:38       300,236,392 converted.mat
09-02-2011  20:37       300,264,316 converted_append.mat
               2 File(s)    600,500,708 bytes

文件大小不相同。在 Windows 7 中使用 fc 进行的测试揭示了……许多二进制差异。也许数据发生了一些变化——因此这没有告诉我们什么。

有人知道这里发生了什么吗？附加文件是否可能使用更优化的数据结构？或者也许 Windows 已经缓存了该文件并且可以更快地访问它？

我也尝试读取这两个文件。如果没有在此处提供数字，则附加版本会更快一些（但从长远来看可能意味着某些事情）。

[编辑]：我只是尝试使用无格式标志（在我的系统上默认为 -v7），并且不再有太大区别：

Save data the simple way (-v7):
Elapsed time is 13.092084 seconds.
Save data using -append (-v7):
Elapsed time is 14.345314 seconds.

[编辑]：我更正了上述内容错误。之前我提到统计数据是针对 -v6 的，但我错了。我刚刚删除了格式标志并假设默认值是 -v6，但实际上它是 -v7。

我使用安德鲁的精细框架为我的系统上的所有格式创建了新的测试统计数据（所有格式都用于相同的随机测试数据，现在从文件中读取）：

15:15:51.422: Testing speed, format=-v6, R2009b on PCWIN, arch=x86, os=Microsoft Windows 7 Professional  6.1.7600 N/A Build 7600
15:16:00.829: Save the simple way:            0.358 sec
15:16:01.188: Save using multiple append:     7.432 sec
15:16:08.614: Save using one big append:      1.161 sec

15:16:24.659: Testing speed, format=-v7, R2009b on PCWIN, arch=x86, os=Microsoft Windows 7 Professional  6.1.7600 N/A Build 7600
15:16:33.442: Save the simple way:           12.884 sec
15:16:46.329: Save using multiple append:    14.442 sec
15:17:00.775: Save using one big append:     13.390 sec

15:17:31.579: Testing speed, format=-v7.3, R2009b on PCWIN, arch=x86, os=Microsoft Windows 7 Professional  6.1.7600 N/A Build 7600
15:17:40.690: Save the simple way:           13.751 sec
15:17:54.434: Save using multiple append:     3.970 sec
15:17:58.412: Save using one big append:      6.138 sec

以及文件的大小：

10-02-2011  15:16       299,528,768 converted_format-v6.mat
10-02-2011  15:16       299,528,768 converted_append_format-v6.mat
10-02-2011  15:16       299,528,832 converted_append_batch_format-v6.mat
10-02-2011  15:16       299,894,027 converted_format-v7.mat
10-02-2011  15:17       299,894,027 converted_append_format-v7.mat
10-02-2011  15:17       299,894,075 converted_append_batch_format-v7.mat
10-02-2011  15:17       300,236,392 converted_format-v7.3.mat
10-02-2011  15:17       300,264,316 converted_append_format-v7.3.mat
10-02-2011  15:18       300,101,800 converted_append_batch_format-v7.3.mat
               9 File(s)  2,698,871,005 bytes

因此 -v6 似乎是写入速度最快的。文件大小也没有太大差异。据我所知，HDF5 确实内置了一些基本的膨胀方法。

嗯，可能对底层 HDF5 写入函数进行了一些优化？

目前，我仍然认为某些底层基本 HDF5 写入函数针对将数据集添加到 HDF5 文件进行了优化（这就是向 -7.3 文件添加新变量时发生的情况）。我相信我在某处读到 HDF5 应该以这种方式进行优化......尽管不能确定。

其他需要注意的细节：

正如我们在下面安德鲁的回答中看到的那样，这种行为是非常系统性的。至于是否在函数的本地范围或 m 脚本的“全局”范围内运行这些东西似乎也非常重要。我的第一个结果来自 m 脚本，其中文件被写入当前目录。我仍然只能在 m 脚本中重现 -7.3 的 1 秒写入。函数调用显然增加了一些开销。

2016 年 7 月更新：

我再次发现了这一点，并认为我可以使用目前可用的最新 MATLAB 对其进行测试。使用 Windows 7 x64 上的 MATLAB R2016a，问题似乎已得到解决：

14:04:06.277: Testing speed, imax=255, R2016a on PCWIN64, arch=AMD64, 16 GB, os=Microsoft Windows 7 Enterprise  Version 6.1 (Build 7601: Service Pack 1)
14:04:10.600: basic -v7.3:                    7.599 sec      5.261 GB used
14:04:18.229: basic -v7.3:                    7.894 sec      5.383 GB used
14:04:26.154: basic -v7.3:                    7.909 sec      5.457 GB used
14:04:34.096: basic -v7.3:                    7.919 sec      5.498 GB used
14:04:42.048: basic -v7.3:                    7.886 sec      5.516 GB used     286 MB file   7.841 sec mean
14:04:50.581: multiappend -v7.3:              7.928 sec      5.819 GB used
14:04:58.544: multiappend -v7.3:              7.905 sec      5.834 GB used
14:05:06.485: multiappend -v7.3:              8.013 sec      5.844 GB used
14:05:14.542: multiappend -v7.3:              8.591 sec      5.860 GB used
14:05:23.168: multiappend -v7.3:              8.059 sec      5.868 GB used     286 MB file   8.099 sec mean
14:05:31.913: bigappend -v7.3:                7.727 sec      5.837 GB used
14:05:39.676: bigappend -v7.3:                7.740 sec      5.879 GB used
14:05:47.453: bigappend -v7.3:                7.645 sec      5.884 GB used
14:05:55.133: bigappend -v7.3:                7.656 sec      5.877 GB used
14:06:02.824: bigappend -v7.3:                7.963 sec      5.871 GB used     286 MB file   7.746 sec mean

这是使用下面接受的答案中 Andrew Janke 的 reproMatfileAppendSpeedup 函数进行测试的（5 遍，格式 7.3）。现在，-append 对于单个保存来说同样慢，或者更慢 - 正如它应该的那样。也许这是 R2009a 中使用的 HDF5 驱动程序的早期版本的问题。

原文

NOTE: This question deals with an issue observed back in 2011 with an old MATLAB version (R2009a). As per the update below from July 2016, the issue/bug in MATLAB seems to no longer exist (tested with R2016a; scroll down to end of question to see update).

I am using MATLAB R2009b and I need to write a larger script that converts the contents of a larger set of .zip files to v7.3 mat files (with an underlying HDF5-datamodel). Reading is OK. The issue is with saving. And there is actually no problem. My files saves nicely using the save command.

My question is more in the sense: Why am I observing the following surprising (for me) behavior in MATLAB?

let's look at my issue in general. In this current test-scenario I will be generating one output: A -v7.3 mat-file. This .mat-file will contain 40 blocks as individual variables. Each variable will be named "block_NNN" from 1 to 40 and will contain a struct with fields frames and blockNo. Field frames contains a 480x240x65 sequence of uint8 imagedata (here just random data generated using randi). Field blockNo contains the block number.

Remark: In the real script (that I have yet to finish) I will be doing the above at total of 370 times, converting a total of 108GB of raw data. Which is why I am concerned with the following.

Anyway, first I define some general variables:

% some sizes for dummy data and loops:
num_blockCount = 40;
num_blockLength = 65;
num_frameHeight = 480;
num_frameWidth = 240;

I then generate some dummy code that has shape and size identical to the actual raw data:

% generate empty struct:
stu_data2disk = struct();

% loop over blocks:
for num_k = 1:num_blockCount

   % generate block-name:
   temp_str_blockName = sprintf('block_%03u', num_k);

   % generate temp struct for current block:
   temp_stu_value = struct();
   temp_stu_value.frames = randi( ...
      [0 255], ...
      [num_frameHeight num_frameWidth num_blockLength], ...
      'uint8' ...
   );
   temp_stu_value.blockNo = num_k;

   % using dynamic field names:
   stu_data2disk.(sprintf('block_%03u', num_k)) = temp_stu_value;

end

I now have all my random test-data in a struct stu_data2disk. Now I would like to save the data using one of two possible methods.

Let's try the simple one first:

% save data (simple):
disp('Save data the simple way:')
tic;
save converted.mat -struct stu_data2disk -v7.3;
toc;

The file is written without problems (286MB). The output is:

Save data the simple way:
Elapsed time is 14.004449 seconds.

OK - then I remembered that I would like to follow the save-procedure over the 40 blocks. Thus instead of the above I loop over the blocks and append them in sequence:

% save to file, using append:
disp('Save data using -append:')
tic;
for num_k = 1:num_blockCount

   % generate block-name:
   temp_str_blockName = sprintf('block_%03u', num_k);

   temp_str_appendToggle = '';
   if (num_k > 1)
      temp_str_appendToggle = '-append';
   end

   % generate save command:
   temp_str_saveCommand = [ ...
      'save ', ...
      'converted_append.mat ', ...
      '-struct stu_data2disk ', temp_str_blockName, ' '...
      temp_str_appendToggle, ' ', ...
      '-v7.3', ...
      ';' ...
   ];

   % evaluate save command:
   eval(temp_str_saveCommand);

end
toc;

And again the file saves nicely (286MB). The output is:

Save data using -append:
Elapsed time is 0.956968 seconds.

Interestingly the append-method is much faster? My question is why?

Output from dir converted*.mat:

09-02-2011  20:38       300,236,392 converted.mat
09-02-2011  20:37       300,264,316 converted_append.mat
               2 File(s)    600,500,708 bytes

The files are not identical in size. And a test with fc in windows 7 revealed ... well many binary differences. Perhaps the data was shifted a bit - thus this tells us nothing.

Does someone have an idea what is going on here? Is the appended file using a much more optimized data-structure perhaps? Or maybe windows has cached the file and makes access to it much faster?

I made the effort of test-reading from the two files as well. Without presenting the numbers here the appended version was a little bit faster (could mean something in the long run though).

[EDIT]: I just tried using no format flag (defaults to -v7 on my system) and there is not much difference anymore:

Save data the simple way (-v7):
Elapsed time is 13.092084 seconds.
Save data using -append (-v7):
Elapsed time is 14.345314 seconds.

[EDIT]: I corrected the above mistake. Previously I mentioned that the stats were for -v6 but I was mistaken. I had just removed the format flag and assumed the default was -v6 but actually it is -v7.

I have created new test stats for all formats on my system using Andrew's fine framework (all formats are for the same random test data, now read from file):

15:15:51.422: Testing speed, format=-v6, R2009b on PCWIN, arch=x86, os=Microsoft Windows 7 Professional  6.1.7600 N/A Build 7600
15:16:00.829: Save the simple way:            0.358 sec
15:16:01.188: Save using multiple append:     7.432 sec
15:16:08.614: Save using one big append:      1.161 sec

15:16:24.659: Testing speed, format=-v7, R2009b on PCWIN, arch=x86, os=Microsoft Windows 7 Professional  6.1.7600 N/A Build 7600
15:16:33.442: Save the simple way:           12.884 sec
15:16:46.329: Save using multiple append:    14.442 sec
15:17:00.775: Save using one big append:     13.390 sec

15:17:31.579: Testing speed, format=-v7.3, R2009b on PCWIN, arch=x86, os=Microsoft Windows 7 Professional  6.1.7600 N/A Build 7600
15:17:40.690: Save the simple way:           13.751 sec
15:17:54.434: Save using multiple append:     3.970 sec
15:17:58.412: Save using one big append:      6.138 sec

And the sizes of the files:

10-02-2011  15:16       299,528,768 converted_format-v6.mat
10-02-2011  15:16       299,528,768 converted_append_format-v6.mat
10-02-2011  15:16       299,528,832 converted_append_batch_format-v6.mat
10-02-2011  15:16       299,894,027 converted_format-v7.mat
10-02-2011  15:17       299,894,027 converted_append_format-v7.mat
10-02-2011  15:17       299,894,075 converted_append_batch_format-v7.mat
10-02-2011  15:17       300,236,392 converted_format-v7.3.mat
10-02-2011  15:17       300,264,316 converted_append_format-v7.3.mat
10-02-2011  15:18       300,101,800 converted_append_batch_format-v7.3.mat
               9 File(s)  2,698,871,005 bytes

Thus -v6 seems to be the fastest for writing. Also not any large differences in files sizes. HDF5 does have some basic inflate-method built-in as far as I know.

Hmm, probably some optimization in the underlying HDF5-write functions?

Currently I still think that some underlying fundamental HDF5-write function is optimized for adding datasets to an HDF5-file (which is what happens when adding new variables to a -7.3 file). I believe I have read somewhere that HDF5 should optimized in this very way... though cannot be sure.

Other details to note:

The behavior is very systemic as we see in Andrew's answer below. It also seems to be quite important as to whether or not you run these things in a local scope of a function or in the "global" of an m-script. My first results were from an m-script where files were written to the current directory. I can still only reproduce the 1-second write for -7.3 in the m-script. The function-calls add some overhead apparently.

Update July 2016:

I found this again and thought I might test it with the newest MATLAB available to me at the moment. With MATLAB R2016a on Windows 7 x64 the problem seems to have been fixed:

14:04:06.277: Testing speed, imax=255, R2016a on PCWIN64, arch=AMD64, 16 GB, os=Microsoft Windows 7 Enterprise  Version 6.1 (Build 7601: Service Pack 1)
14:04:10.600: basic -v7.3:                    7.599 sec      5.261 GB used
14:04:18.229: basic -v7.3:                    7.894 sec      5.383 GB used
14:04:26.154: basic -v7.3:                    7.909 sec      5.457 GB used
14:04:34.096: basic -v7.3:                    7.919 sec      5.498 GB used
14:04:42.048: basic -v7.3:                    7.886 sec      5.516 GB used     286 MB file   7.841 sec mean
14:04:50.581: multiappend -v7.3:              7.928 sec      5.819 GB used
14:04:58.544: multiappend -v7.3:              7.905 sec      5.834 GB used
14:05:06.485: multiappend -v7.3:              8.013 sec      5.844 GB used
14:05:14.542: multiappend -v7.3:              8.591 sec      5.860 GB used
14:05:23.168: multiappend -v7.3:              8.059 sec      5.868 GB used     286 MB file   8.099 sec mean
14:05:31.913: bigappend -v7.3:                7.727 sec      5.837 GB used
14:05:39.676: bigappend -v7.3:                7.740 sec      5.879 GB used
14:05:47.453: bigappend -v7.3:                7.645 sec      5.884 GB used
14:05:55.133: bigappend -v7.3:                7.656 sec      5.877 GB used
14:06:02.824: bigappend -v7.3:                7.963 sec      5.871 GB used     286 MB file   7.746 sec mean

This was tested with Andrew Janke's reproMatfileAppendSpeedup function in the accepted answer below (5 passes with format 7.3). Now, -append is equally slow, or slower, to a single save - as it should be. Perhaps it was a problem with an early build of the HDF5 driver used in R2009a.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

不疑不惑不回忆 2024-10-23 19:43:38

圣牛。我可以重现。也尝试了单追加变体；它甚至更快。看起来“-append”神奇地使基于 HDF5 的 save() 速度提高了 30 倍。我没有解释，但我想分享我的发现。

我将您的测试代码包装在一个函数中，对其进行重构以使保存逻辑与测试数据结构无关，以便您可以在其他数据集上运行它，并添加了更多诊断输出。

到处都看不到大幅加速。它在我的 64 位 XP 机器和 32 位 Server 2003 机器上很大，在我的 64 位 Windows 7 机器上很大，但在 32 位 XP 机器上不存在。（尽管在 Server 2003 上多次追加是一个巨大的损失。）R2010b 在许多情况下速度较慢。也许 HDF5 附加或保存它的使用在较新的 Windows 版本上很稳定。（XP x64 实际上是 Server 2003 内核。）或者可能只是机器配置差异。 XP x64 机器上有一个快速 RAID，而 32 位 XP 的 RAM 比其他机器要少。您运行什么操作系统和架构？你也可以尝试一下这个重现吗？

19:36:40.289: Testing speed, format=-v7.3, R2009b on PCWIN64, arch=AMD64, os=Microsoft(R) Windows(R) XP Professional x64 Edition 5.2.3790 Service Pack 2 Build 3790
19:36:55.930: Save the simple way:           11.493 sec
19:37:07.415: Save using multiple append:     1.594 sec
19:37:09.009: Save using one big append:      0.424 sec


19:39:21.681: Testing speed, format=-v7.3, R2009b on PCWIN, arch=x86, os=Microsoft Windows XP Professional 5.1.2600 Service Pack 3 Build 2600
19:39:37.493: Save the simple way:           10.881 sec
19:39:48.368: Save using multiple append:    10.187 sec
19:39:58.556: Save using one big append:     11.956 sec


19:44:33.410: Testing speed, format=-v7.3, R2009b on PCWIN64, arch=AMD64, os=Microsoft Windows 7 Professional  6.1.7600 N/A Build 7600
19:44:50.789: Save the simple way:           14.354 sec
19:45:05.156: Save using multiple append:     6.321 sec
19:45:11.474: Save using one big append:      2.143 sec


20:03:37.907: Testing speed, format=-v7.3, R2009b on PCWIN, arch=x86, os=Microsoft(R) Windows(R) Server 2003, Enterprise Edition 5.2.3790 Service Pack 2 Build 3790
20:03:58.532: Save the simple way:           19.730 sec
20:04:18.252: Save using multiple append:    77.897 sec
20:05:36.160: Save using one big append:      0.630 sec

这看起来很大。如果它适用于其他数据集，我自己可能会在很多地方使用这个技巧。这也可能是通过 MathWorks 提出的。他们也可以在正常保存或其他操作系统版本中使用快速追加技术吗？

这是独立的重现功能。

function out = reproMatfileAppendSpeedup(nPasses, tests, imax, formats)
%REPROMATFILEAPPENDSPEEDUP Show how -append makes v7.3 saves much faster
%
% Examples:
% reproMatfileAppendSpeedup()
% reproMatfileAppendSpeedup(2, [], 0, {'7.3','7','6'}); % low-entropy test

if nargin < 1 || isempty(nPasses);  nPasses = 1;  end
if nargin < 2 || isempty(tests);    tests = {'basic','multiappend','bigappend'}; end
if nargin < 3 || isempty(imax);     imax = 255; end
if nargin < 4 || isempty(formats);  formats = '7.3'; end % -v7 and -v6 do not show the speedup
tests = cellstr(tests);
formats = cellstr(formats);

fprintf('%s: Testing speed, imax=%d, R%s on %s\n',...
    timestamp, imax, version('-release'), systemDescription());

tempDir = setupTempDir();
testData = generateTestData(imax);

testMap = struct('basic','saveSimple', 'multiappend','saveMultiAppend', 'bigappend','saveBigAppend');

for iFormat = 1:numel(formats)
    format = formats{iFormat};
    formatFlag = ['-v' format];
    %fprintf('%s: Format %s\n', timestamp, formatFlag);
    for iTest = 1:numel(tests)
        testName = tests{iTest};
        saveFcn = testMap.(testName);
        te = NaN(1, nPasses);
        for iPass = 1:nPasses
            fprintf('%s: %-30s', timestamp, [testName ' ' formatFlag ':']);
            t0 = tic;
            matFile = fullfile(tempDir, sprintf('converted-%s-%s-%d.mat', testName, format, i));
            feval(saveFcn, matFile, testData, formatFlag);
            te(iPass) = toc(t0);
            if iPass == nPasses
                fprintf('%7.3f sec      %5.3f GB used   %5.0f MB file   %5.3f sec mean\n',...
                    te(iPass), physicalMemoryUsed/(2^30), getfield(dir(matFile),'bytes')/(2^20), mean(te));
            else
                fprintf('%7.3f sec      %5.3f GB used\n', te(iPass), physicalMemoryUsed/(2^30));
            end
        end
        % Verify data to make sure we are sane
        gotBack = load(matFile);
        gotBack = rmfield(gotBack, intersect({'dummy'}, fieldnames(gotBack)));
        if ~isequal(gotBack, testData)
            fprintf('ERROR: Loaded data differs from original for %s %s\n', formatFlag, testName);
        end
    end
end

% Clean up
rmdir(tempDir, 's');

%%
function saveSimple(file, data, formatFlag)
save(file, '-struct', 'data', formatFlag);

%%
function out = physicalMemoryUsed()
if ~ispc
    out = NaN;
    return; % memory() only works on Windows
end
[u,s] = memory();
out = s.PhysicalMemory.Total - s.PhysicalMemory.Available;

%%
function saveBigAppend(file, data, formatFlag)
dummy = 0;
save(file, 'dummy', formatFlag);
fieldNames = fieldnames(data);
save(file, '-struct', 'data', fieldNames{:}, '-append', formatFlag);

%%
function saveMultiAppend(file, data, formatFlag)
fieldNames = fieldnames(data);
for i = 1:numel(fieldNames)
    if (i > 1); appendFlag = '-append'; else; appendFlag = ''; end
    save(file, '-struct', 'data', fieldNames{i}, appendFlag, formatFlag);
end


%%
function testData = generateTestData(imax)
nBlocks = 40;
blockSize = [65 480 240];
for i = 1:nBlocks
    testData.(sprintf('block_%03u', i)) = struct('blockNo',i,...
        'frames', randi([0 imax], blockSize, 'uint8'));
end

%%
function out = timestamp()
%TIMESTAMP Showing timestamps to make sure it is not a tic/toc problem
out = datestr(now, 'HH:MM:SS.FFF');

%%
function out = systemDescription()
if ispc
    platform = [system_dependent('getos'),' ',system_dependent('getwinsys')];
elseif ismac
    [fail, input] = unix('sw_vers');
    if ~fail
        platform = strrep(input, 'ProductName:', '');
        platform = strrep(platform, sprintf('\t'), '');
        platform = strrep(platform, sprintf('\n'), ' ');
        platform = strrep(platform, 'ProductVersion:', ' Version: ');
        platform = strrep(platform, 'BuildVersion:', 'Build: ');
    else
        platform = system_dependent('getos');
    end
else
    platform = system_dependent('getos');
end
arch = getenv('PROCESSOR_ARCHITEW6432');
if isempty(arch)
    arch = getenv('PROCESSOR_ARCHITECTURE');
end
try
    [~,sysMem] = memory();
catch
    sysMem.PhysicalMemory.Total = NaN;
end
out = sprintf('%s, arch=%s, %.0f GB, os=%s',...
    computer, arch, sysMem.PhysicalMemory.Total/(2^30), platform);

%%
function out = setupTempDir()
out = fullfile(tempdir, sprintf('%s - %s', mfilename, datestr(now, 'yyyymmdd-HHMMSS-FFF')));
mkdir(out);

编辑：我修改了 repro 函数，添加了多次迭代并参数化它以保存样式、文件格式和 randi 生成器的 imax。

我认为文件系统缓存是快速追加行为的一个重要因素。当我使用 reproMatfileAppendSpeedup(20) 连续运行一堆并在 Process Explorer 中观察系统信息时，大多数运行时间都在一秒以下，并且物理内存使用量迅速增加了几 GB。然后，每经过十几次，写入就会停止并需要 20 或 30 秒，并且物理 RAM 使用率会慢慢下降到大约开始时的水平。我认为这意味着 Windows 在 RAM 中缓存大量写入，并且 -append 的某些功能使其更愿意这样做。但对我来说，包括这些摊位在内的摊销时间仍然比基本保存快很多。

顺便说一句，在进行了几个小时的多次传递后，我很难重现原始的计时。

Holy cow. I can reproduce. Tried the single-append variation too; it's even speedier. Looks like "-append" just magically makes HDF5-based save() 30x faster. I don't have an explanation but I wanted to share what I found.

I wrapped up your test code in a function, refactoring it to make the save logic agnostic about the test data structure so you can run it on other data sets, and added some more diagnostic output.

Don't see the big speedup everywhere. It's huge on my 64-bit XP box and a 32-bit Server 2003 box, big on my 64-bit Windows 7 box, nonexistent on a 32-bit XP box. (Though multiple appends are a huge loss on Server 2003.) R2010b is slower in many cases. Maybe HDF5 appends or save's use of it just rock on newer Windows builds. (XP x64 is actually the Server 2003 kernel.) Or maybe it's just a machine config difference. There's a fast RAID on the XP x64 machine, and the 32-bit XP has less RAM than the rest. What OS and architecture are you running? Can you try this repro too?

19:36:40.289: Testing speed, format=-v7.3, R2009b on PCWIN64, arch=AMD64, os=Microsoft(R) Windows(R) XP Professional x64 Edition 5.2.3790 Service Pack 2 Build 3790
19:36:55.930: Save the simple way:           11.493 sec
19:37:07.415: Save using multiple append:     1.594 sec
19:37:09.009: Save using one big append:      0.424 sec


19:39:21.681: Testing speed, format=-v7.3, R2009b on PCWIN, arch=x86, os=Microsoft Windows XP Professional 5.1.2600 Service Pack 3 Build 2600
19:39:37.493: Save the simple way:           10.881 sec
19:39:48.368: Save using multiple append:    10.187 sec
19:39:58.556: Save using one big append:     11.956 sec


19:44:33.410: Testing speed, format=-v7.3, R2009b on PCWIN64, arch=AMD64, os=Microsoft Windows 7 Professional  6.1.7600 N/A Build 7600
19:44:50.789: Save the simple way:           14.354 sec
19:45:05.156: Save using multiple append:     6.321 sec
19:45:11.474: Save using one big append:      2.143 sec


20:03:37.907: Testing speed, format=-v7.3, R2009b on PCWIN, arch=x86, os=Microsoft(R) Windows(R) Server 2003, Enterprise Edition 5.2.3790 Service Pack 2 Build 3790
20:03:58.532: Save the simple way:           19.730 sec
20:04:18.252: Save using multiple append:    77.897 sec
20:05:36.160: Save using one big append:      0.630 sec

This looks huge. If it holds up on other data sets, I might use this trick in a lot of places myself. It may be something to bring up with MathWorks, too. Could they use the fast append technique in normal saves or other OS versions, too?

Here's the self-contained repro function.

function out = reproMatfileAppendSpeedup(nPasses, tests, imax, formats)
%REPROMATFILEAPPENDSPEEDUP Show how -append makes v7.3 saves much faster
%
% Examples:
% reproMatfileAppendSpeedup()
% reproMatfileAppendSpeedup(2, [], 0, {'7.3','7','6'}); % low-entropy test

if nargin < 1 || isempty(nPasses);  nPasses = 1;  end
if nargin < 2 || isempty(tests);    tests = {'basic','multiappend','bigappend'}; end
if nargin < 3 || isempty(imax);     imax = 255; end
if nargin < 4 || isempty(formats);  formats = '7.3'; end % -v7 and -v6 do not show the speedup
tests = cellstr(tests);
formats = cellstr(formats);

fprintf('%s: Testing speed, imax=%d, R%s on %s\n',...
    timestamp, imax, version('-release'), systemDescription());

tempDir = setupTempDir();
testData = generateTestData(imax);

testMap = struct('basic','saveSimple', 'multiappend','saveMultiAppend', 'bigappend','saveBigAppend');

for iFormat = 1:numel(formats)
    format = formats{iFormat};
    formatFlag = ['-v' format];
    %fprintf('%s: Format %s\n', timestamp, formatFlag);
    for iTest = 1:numel(tests)
        testName = tests{iTest};
        saveFcn = testMap.(testName);
        te = NaN(1, nPasses);
        for iPass = 1:nPasses
            fprintf('%s: %-30s', timestamp, [testName ' ' formatFlag ':']);
            t0 = tic;
            matFile = fullfile(tempDir, sprintf('converted-%s-%s-%d.mat', testName, format, i));
            feval(saveFcn, matFile, testData, formatFlag);
            te(iPass) = toc(t0);
            if iPass == nPasses
                fprintf('%7.3f sec      %5.3f GB used   %5.0f MB file   %5.3f sec mean\n',...
                    te(iPass), physicalMemoryUsed/(2^30), getfield(dir(matFile),'bytes')/(2^20), mean(te));
            else
                fprintf('%7.3f sec      %5.3f GB used\n', te(iPass), physicalMemoryUsed/(2^30));
            end
        end
        % Verify data to make sure we are sane
        gotBack = load(matFile);
        gotBack = rmfield(gotBack, intersect({'dummy'}, fieldnames(gotBack)));
        if ~isequal(gotBack, testData)
            fprintf('ERROR: Loaded data differs from original for %s %s\n', formatFlag, testName);
        end
    end
end

% Clean up
rmdir(tempDir, 's');

%%
function saveSimple(file, data, formatFlag)
save(file, '-struct', 'data', formatFlag);

%%
function out = physicalMemoryUsed()
if ~ispc
    out = NaN;
    return; % memory() only works on Windows
end
[u,s] = memory();
out = s.PhysicalMemory.Total - s.PhysicalMemory.Available;

%%
function saveBigAppend(file, data, formatFlag)
dummy = 0;
save(file, 'dummy', formatFlag);
fieldNames = fieldnames(data);
save(file, '-struct', 'data', fieldNames{:}, '-append', formatFlag);

%%
function saveMultiAppend(file, data, formatFlag)
fieldNames = fieldnames(data);
for i = 1:numel(fieldNames)
    if (i > 1); appendFlag = '-append'; else; appendFlag = ''; end
    save(file, '-struct', 'data', fieldNames{i}, appendFlag, formatFlag);
end


%%
function testData = generateTestData(imax)
nBlocks = 40;
blockSize = [65 480 240];
for i = 1:nBlocks
    testData.(sprintf('block_%03u', i)) = struct('blockNo',i,...
        'frames', randi([0 imax], blockSize, 'uint8'));
end

%%
function out = timestamp()
%TIMESTAMP Showing timestamps to make sure it is not a tic/toc problem
out = datestr(now, 'HH:MM:SS.FFF');

%%
function out = systemDescription()
if ispc
    platform = [system_dependent('getos'),' ',system_dependent('getwinsys')];
elseif ismac
    [fail, input] = unix('sw_vers');
    if ~fail
        platform = strrep(input, 'ProductName:', '');
        platform = strrep(platform, sprintf('\t'), '');
        platform = strrep(platform, sprintf('\n'), ' ');
        platform = strrep(platform, 'ProductVersion:', ' Version: ');
        platform = strrep(platform, 'BuildVersion:', 'Build: ');
    else
        platform = system_dependent('getos');
    end
else
    platform = system_dependent('getos');
end
arch = getenv('PROCESSOR_ARCHITEW6432');
if isempty(arch)
    arch = getenv('PROCESSOR_ARCHITECTURE');
end
try
    [~,sysMem] = memory();
catch
    sysMem.PhysicalMemory.Total = NaN;
end
out = sprintf('%s, arch=%s, %.0f GB, os=%s',...
    computer, arch, sysMem.PhysicalMemory.Total/(2^30), platform);

%%
function out = setupTempDir()
out = fullfile(tempdir, sprintf('%s - %s', mfilename, datestr(now, 'yyyymmdd-HHMMSS-FFF')));
mkdir(out);

EDIT: I modified the repro function, adding multiple iterations and parameterizing it for save styles, file formats, and imax for the randi generator.

I think filesystem caching is a big factor to the fast -append behavior. When I do a bunch of runs in a row with reproMatfileAppendSpeedup(20) and watch System Information in Process Explorer, most of them are under a second, and physical memory usage quickly ramps up by a couple GB. Then every dozen passes, the write stalls and takes 20 or 30 seconds, and physical RAM usage slowly ramps down to about where it started. I think this means that Windows is caching a lot of writes in RAM, and something about -append makes it more willing to do so. But the amortized time including those stalls is still a lot faster than the basic save, for me.

By the way, after doing multiple passes for a couple hours, I'm having a hard time reproducing the original timings.

回复收藏 0 原文