从 .mat 文件中删除变量

发布于 2024-10-03 20:08:53 字数 181 浏览 6 评论 0原文

这里有人知道如何从 matlab 文件中删除变量吗?我知道您可以使用 save -append 方法将变量添加到现有的 matlab 文件中,但没有有关如何从文件中删除变量的文档。

在有人说“只需保存它”之前,这是因为我将中间处理步骤保存到磁盘以缓解内存问题,最终每个分析例程将有近 10 GB 的中间数据。谢谢!

Does anyone here know how to delete a variable from a matlab file? I know that you can add variables to an existing matlab file using the save -append method, but there's no documentation on how to delete variables from the file.

Before someone says, "just save it", its because I'm saving intermediate processing steps to disk to alleviate memory problems, and in the end there will be almost 10 GB of intermediate data per analysis routine. Thanks!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

浴红衣 2024-10-10 20:08:53

有趣的是,您可以将 -append 选项与 SAVE 一起使用有效删除 .mat 文件中的数据。请注意文档中的摘录(我添加的粗体):

<块引用>

对于 MAT 文件,-append 将新变量添加到文件或用工作区中的值替换现有变量的保存值

换句话说,如果 .mat 文件中的变量名为 A,您可以使用 A副本保存该变量(您已使用 -append 选项设置为 [])。 .mat 文件中仍然会有一个名为 A 的变量,但它将为空,从而减少总文件大小。

下面是一个示例:

>> A = rand(1000);            %# Create a 1000-by-1000 matrix of random values
>> save('savetest.mat','A');  %# Save A to a file
>> whos -file savetest.mat    %# Look at the .mat file contents
  Name         Size                Bytes  Class     Attributes

  A         1000x1000            8000000  double

文件大小约为 7.21 MB。现在执行此操作:

>> A = [];                              %# Set the variable A to empty
>> save('savetest.mat','A','-append');  %# Overwrite A in the file
>> whos -file savetest.mat              %# Look at the .mat file contents
  Name      Size            Bytes  Class     Attributes

  A         0x0                 0  double

现在文件大小约为 169 字节。该变量仍然在那里,但它是空的。

Interestingly enough, you can use the -append option with SAVE to effectively erase data from a .mat file. Note this excerpt from the documentation (bold added by me):

For MAT-files, -append adds new variables to the file or replaces the saved values of existing variables with values in the workspace.

In other words, if a variable in your .mat file is called A, you can save over that variable with a new copy of A (that you've set to []) using the -append option. There will still be a variable called A in the .mat file, but it will be empty and thus reduce the total file size.

Here's an example:

>> A = rand(1000);            %# Create a 1000-by-1000 matrix of random values
>> save('savetest.mat','A');  %# Save A to a file
>> whos -file savetest.mat    %# Look at the .mat file contents
  Name         Size                Bytes  Class     Attributes

  A         1000x1000            8000000  double

The file size will be about 7.21 MB. Now do this:

>> A = [];                              %# Set the variable A to empty
>> save('savetest.mat','A','-append');  %# Overwrite A in the file
>> whos -file savetest.mat              %# Look at the .mat file contents
  Name      Size            Bytes  Class     Attributes

  A         0x0                 0  double

And now the file size will be around 169 bytes. The variable is still in there, but it is empty.

甜中书 2024-10-10 20:08:53

10 GB 数据?由于 MAT 格式开销,更新多变量 MAT 文件可能会变得昂贵。考虑拆分数据并将每个变量保存到不同的 MAT 文件中,必要时使用目录进行组织。即使您有一个方便的函数来从 MAT 文件中删除变量,它的效率也会很低。 MAT 文件中的变量是连续排列的,因此替换一个变量可能需要读取和写入其余大部分变量。如果它们位于单独的文件中,您只需删除整个文件即可,速度很快。

要查看其实际效果,请尝试此代码,在调试器中单步调试它,同时使用 Process Explorer(在 Windows 上)之类的工具来监视其 I/O 活动。

function replace_vars_in_matfile

x = 1;
% Random dummy data; zeros would compress really well and throw off results
y = randi(intmax('uint8')-1, 100*(2^20), 1, 'uint8');

tic; save test.mat x y; toc;
x = 2;
tic; save -append test.mat x; toc;
y = y + 1;
tic; save -append test.mat y; toc;

在我的机器上,结果如下所示。 (读取和写入是累积的,时间是每次操作的时间。)

                    Read (MB)      Write (MB)       Time (sec)
before any write:   25             0
first write:        25             105              3.7
append x:           235            315              3.6
append y:           235            420              3.8

请注意,更新小 x 变量比更新大 y 更昂贵。大部分 I/O 活动都是“冗余”的内务工作,目的是保持 MAT 文件格式的组织有序,如果每个变量都在自己的文件中,这些活动就会消失。

另外,尝试将这些文件保留在本地文件系统上;它会比网络驱动器快得多。如果它们需要存储在网络驱动器上,请考虑对本地临时文件(可以使用 tempname() 选择)执行 save() 和 load() 操作,然后将它们复制到网络驱动器或从网络驱动器复制它们。对于本地文件系统,Matlab 的保存和加载往往要快得多,足以使本地保存/加载加上副本可以成为实质性的净胜利。


这是一个基本实现,可让您使用熟悉的 save() 和 load() 签名将变量保存到单独的文件中。它们以“d”为前缀,表示它们是基于目录的版本。他们使用了 evalin() 和 allocatein() 的一些技巧,所以我认为值得发布完整的代码。

function dsave(file, varargin)
%DSAVE Like save, but each var in its own file
%
% dsave filename var1 var2 var3...
if nargin < 1 || isempty(file); file = 'matlab';  end
[tfStruct,loc] = ismember({'-struct'}, varargin);
args = varargin;
args(loc(tfStruct)) = [];
if ~all(cellfun(@isvarname, args))
    error('Invalid arguments. Usage: dsave filename <-struct> var1 var2 var3 ...');
end
if tfStruct
    structVarName = args{1};
    s = evalin('caller', structVarName);
else
    varNames = args;
    if isempty(args)
        w = evalin('caller','whos');
        varNames = { w.name };
    end
    captureExpr = ['struct(' ...
        join(',', cellfun(@(x){sprintf('''%s'',{%s}',x,x)}, varNames)) ')'];
    s = evalin('caller', captureExpr);
end

% Use Java checks to avoid partial path ambiguity
jFile = java.io.File(file);
if ~jFile.exists()
    ok = mkdir(file);
    if ~ok; 
        error('failed creating dsave dir %s', file);
    end
elseif ~jFile.isDirectory()
    error('Cannot save: destination exists but is not a dir: %s', file);
end
names = fieldnames(s);
for i = 1:numel(names)
    varFile = fullfile(file, [names{i} '.mat']);
    varStruct = struct(names{i}, {s.(names{i})});
    save(varFile, '-struct', 'varStruct');
end

function out = join(Glue, Strings)
Strings = cellstr(Strings);
if length( Strings ) == 0
    out = '';
elseif length( Strings ) == 1
    out = Strings{1};
else
    Glue = sprintf( Glue ); % Support escape sequences
    out = strcat( Strings(1:end-1), { Glue } );
    out = [ out{:} Strings{end} ];
end

这是 load() 等效项。

function out = dload(file,varargin)
%DLOAD Like load, but each var in its own file
if nargin < 1 || isempty(file); file = 'matlab'; end
varNames = varargin;
if ~exist(file, 'dir')
    error('Not a dsave dir: %s', file);
end
if isempty(varNames)
    d = dir(file);
    varNames = regexprep(setdiff(ls(file), {'.','..'}), '\.mat

Dwhos() 相当于 whos('-file')。

function out = dwhos(file)
%DWHOS List variable names in a dsave dir
if nargin < 1 || isempty(file); file = 'matlab'; end
out = regexprep(setdiff(ls(file), {'.','..'}), '\.mat

和 ddelete() 按照您的要求删除各个变量。

function ddelete(file,varargin)
%DDELETE Delete variables from a dsave dir
if nargin < 1 || isempty(file); file = 'matlab'; end
varNames = varargin;
for i = 1:numel(varNames)
    delete(fullfile(file, [varNames{i} '.mat']));
end
, ''); end out = struct; for i = 1:numel(varNames) name = varNames{i}; tmp = load(fullfile(file, [name '.mat'])); out.(name) = tmp.(name); end if nargout == 0 for i = 1:numel(varNames) assignin('caller', varNames{i}, out.(varNames{i})); end clear out end

Dwhos() 相当于 whos('-file')。


和 ddelete() 按照您的要求删除各个变量。


, '');

和 ddelete() 按照您的要求删除各个变量。

, ''); end out = struct; for i = 1:numel(varNames) name = varNames{i}; tmp = load(fullfile(file, [name '.mat'])); out.(name) = tmp.(name); end if nargout == 0 for i = 1:numel(varNames) assignin('caller', varNames{i}, out.(varNames{i})); end clear out end

Dwhos() 相当于 whos('-file')。

和 ddelete() 按照您的要求删除各个变量。

10 GB of data? Updating multi-variable MAT files could get expensive due to MAT format overhead. Consider splitting the data up and saving each variable to a different MAT file, using directories for organization if necessary. Even if you had a convenient function to delete variables from a MAT file, it would be inefficient. The variables in a MAT file are layed out contiguously, so replacing one variable can require reading and writing much of the rest. If they're in separate files, you can just delete the whole file, which is fast.

To see this in action, try this code, stepping through it in the debugger while using something like Process Explorer (on Windows) to monitor its I/O activity.

function replace_vars_in_matfile

x = 1;
% Random dummy data; zeros would compress really well and throw off results
y = randi(intmax('uint8')-1, 100*(2^20), 1, 'uint8');

tic; save test.mat x y; toc;
x = 2;
tic; save -append test.mat x; toc;
y = y + 1;
tic; save -append test.mat y; toc;

On my machine, the results look like this. (Read and Write are cumulative, Time is per operation.)

                    Read (MB)      Write (MB)       Time (sec)
before any write:   25             0
first write:        25             105              3.7
append x:           235            315              3.6
append y:           235            420              3.8

Notice that updating the small x variable is more expensive than updating the large y. Much of this I/O activity is "redundant" housekeeping work to keep the MAT file format organized, and will go away if each variable is in its own file.

Also, try to keep these files on the local filesystem; it'll be a lot faster than network drives. If they need to go on a network drive, consider doing the save() and load() on local temp files (maybe chosen with tempname()) and then copying them to/from the network drive. Matlab's save and load tend to be much faster with local filesystems, enough so that local save/load plus a copy can be a substantial net win.


Here's a basic implementation that will let you save variables to separate files using the familiar save() and load() signatures. They're prefixed with "d" to indicate they're the directory-based versions. They use some tricks with evalin() and assignin(), so I thought it would be worth posting the full code.

function dsave(file, varargin)
%DSAVE Like save, but each var in its own file
%
% dsave filename var1 var2 var3...
if nargin < 1 || isempty(file); file = 'matlab';  end
[tfStruct,loc] = ismember({'-struct'}, varargin);
args = varargin;
args(loc(tfStruct)) = [];
if ~all(cellfun(@isvarname, args))
    error('Invalid arguments. Usage: dsave filename <-struct> var1 var2 var3 ...');
end
if tfStruct
    structVarName = args{1};
    s = evalin('caller', structVarName);
else
    varNames = args;
    if isempty(args)
        w = evalin('caller','whos');
        varNames = { w.name };
    end
    captureExpr = ['struct(' ...
        join(',', cellfun(@(x){sprintf('''%s'',{%s}',x,x)}, varNames)) ')'];
    s = evalin('caller', captureExpr);
end

% Use Java checks to avoid partial path ambiguity
jFile = java.io.File(file);
if ~jFile.exists()
    ok = mkdir(file);
    if ~ok; 
        error('failed creating dsave dir %s', file);
    end
elseif ~jFile.isDirectory()
    error('Cannot save: destination exists but is not a dir: %s', file);
end
names = fieldnames(s);
for i = 1:numel(names)
    varFile = fullfile(file, [names{i} '.mat']);
    varStruct = struct(names{i}, {s.(names{i})});
    save(varFile, '-struct', 'varStruct');
end

function out = join(Glue, Strings)
Strings = cellstr(Strings);
if length( Strings ) == 0
    out = '';
elseif length( Strings ) == 1
    out = Strings{1};
else
    Glue = sprintf( Glue ); % Support escape sequences
    out = strcat( Strings(1:end-1), { Glue } );
    out = [ out{:} Strings{end} ];
end

Here's the load() equivalent.

function out = dload(file,varargin)
%DLOAD Like load, but each var in its own file
if nargin < 1 || isempty(file); file = 'matlab'; end
varNames = varargin;
if ~exist(file, 'dir')
    error('Not a dsave dir: %s', file);
end
if isempty(varNames)
    d = dir(file);
    varNames = regexprep(setdiff(ls(file), {'.','..'}), '\.mat

Dwhos() is the equivalent of whos('-file').

function out = dwhos(file)
%DWHOS List variable names in a dsave dir
if nargin < 1 || isempty(file); file = 'matlab'; end
out = regexprep(setdiff(ls(file), {'.','..'}), '\.mat

And ddelete() to delete the individual variables like you asked.

function ddelete(file,varargin)
%DDELETE Delete variables from a dsave dir
if nargin < 1 || isempty(file); file = 'matlab'; end
varNames = varargin;
for i = 1:numel(varNames)
    delete(fullfile(file, [varNames{i} '.mat']));
end
, ''); end out = struct; for i = 1:numel(varNames) name = varNames{i}; tmp = load(fullfile(file, [name '.mat'])); out.(name) = tmp.(name); end if nargout == 0 for i = 1:numel(varNames) assignin('caller', varNames{i}, out.(varNames{i})); end clear out end

Dwhos() is the equivalent of whos('-file').


And ddelete() to delete the individual variables like you asked.


, '');

And ddelete() to delete the individual variables like you asked.

, ''); end out = struct; for i = 1:numel(varNames) name = varNames{i}; tmp = load(fullfile(file, [name '.mat'])); out.(name) = tmp.(name); end if nargout == 0 for i = 1:numel(varNames) assignin('caller', varNames{i}, out.(varNames{i})); end clear out end

Dwhos() is the equivalent of whos('-file').

And ddelete() to delete the individual variables like you asked.

半葬歌 2024-10-10 20:08:53

据我所知,执行此操作的唯一方法是使用 MAT 文件 API 函数 matDeleteVariable。我想,编写一个 Fortran 或 C 例程来完成此操作是相当容易的,但对于本应更容易的事情来说,这似乎需要付出很大的努力。

The only way of doing this that I know is to use the MAT-file API function matDeleteVariable. It would, I guess, be quite easy to write a Fortran or C routine to do this, but it does seem like a lot of effort for something that ought to be much easier.

彻夜缠绵 2024-10-10 20:08:53

我建议您从要保留的 .mat 文件中加载变量,并将它们保存到新的 .mat 文件中。如有必要,您可以循环加载和保存(使用'-append')。

S = load(filename, '-mat', variablesYouWantToKeep);
save(newFilename,'-struct',S,variablesYouWantToKeep);
%# then you can delete the old file
delete(filename)

I suggest you load the variables from the .mat file you want to keep, and save them to a new .mat file. If necessary, you can load and save (using '-append') in a loop.

S = load(filename, '-mat', variablesYouWantToKeep);
save(newFilename,'-struct',S,variablesYouWantToKeep);
%# then you can delete the old file
delete(filename)
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文