从 MAT 文件中加载带有索引的特定变量
我在一台有大量 RAM 的机器上有一个框架,它可以生成带有一个非常大且专门命名的矩阵的 MAT 文件。该矩阵的计算只进行一次并且需要大量时间。最后将其存储到磁盘上的 MAT 文件中。
在使用阶段,应该加载这个MAT文件。问题是我不需要所有数据 - 只需要从该矩阵中选择某些列。
例如,我在文件 crfh.mat 中有一个矩阵“sign”,大小为 [500x250000],类型为 double。我可能有兴趣仅使用该矩阵中的“ids”加载向量:
sign(:, ids)
有没有办法做到这一点?我在网上搜索了一下,似乎没有人表示需要这样的功能。我正在考虑编写一个 MEX 函数 select_mat() ,例如:
sign_sub = select_mat( mat_file, var_name, ids );
I have a framework on a machine with lots of RAM which produces MAT-files with one very large and specifically named matrix. The computation of this matrix is carried only once and takes lot of time. Finally it is stored to a MAT file on the disk.
During the usage phase, this MAT file should be loaded. The problem is that i don't need all the data - only certain selection of columns from that matrix.
For example, i have a matrix 'sign' in a file crfh.mat of size [500x250000] and type double. I may be interested to load only the vectors using 'ids' from that matrix :
sign( :, ids )
Is there a way to do that? I searched the web and no one seems to have expressed the need for such a functionality. I am thinking to write a MEX function select_mat() like :
sign_sub = select_mat( mat_file, var_name, ids );
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
如果您有一个非常大的矩阵,而您只想加载其中的一部分,我不会将其保存为 .MAT 文件。将矩阵写入自己的二进制文件会更有效。然后,您可以使用 FSEEK 之类的函数跳到文件并仅读取您需要的内容。例如,我们首先使用函数 FWRITE< 将较小的样本矩阵保存到二进制文件中/a>:
现在,我们可以使用函数 FREAD 和 FSEEK:
这只是一个简单的示例。您可能希望将其他信息写入文件(即 标头信息),例如作为矩阵中每个值的字节大小或数据类型。这看起来似乎比仅仅将内容转储到 .MAT 文件需要更多的工作,而且确实如此,但如果文件 IO 操作的效率是一个大问题,那么最好创建您自己的文件格式来处理这种情况下的数据。
If you have one really large matrix that you only want to load parts of, I would not save it as a .MAT file. It would be more efficient to write the matrix to its own binary file. Then you could use functions like FSEEK to skip to various indexed points in the file and read only what you need. For example, let's first save a smaller sample matrix to a binary file using the function FWRITE:
Now, we can read just the third column using the functions FREAD and FSEEK:
This is just a simple example. You would probably want to write other information to the file (i.e. header information) such as the byte size or data type of each value in the matrix. This may seem like more work than just dumping things to a .MAT file, and it is, but if the efficiency of file IO operations is a big concern it's better to create your own file format to handle your data in this case.
您可以从包含多个变量的
.mat
文件加载特定变量。但是,我认为您不能从 MATLAB 中的变量中加载一组任意索引。也就是说,如果您的问题属于仅需要访问特定行/列的类型,那么我可能会为您提供解决方法。
您可以从矩阵创建一个
struct
,每列作为一个单独的字段,然后使用-struct
选项保存.mat
文件,以便每个字段都保存为单独的变量。这样,你就可以拉出你想要的那个了。我不知道直接将矩阵转换为结构的方法。因此,您首先将每一列拆分为单元格(排序是因为您表明您需要访问这些列。如果您需要这些行,则必须进行切换)。它位于单元格
dummyCell
中。现在要保存到结构中,我们需要生成字段名称。它位于字符串单元格fieldNames
中。它生成col1
、col2
等形式的字段名称...如果需要,您可以将其命名为有意义的名称。然后,我们通过将每个单元格分配给相应的字段名称,将cell
转换为struct
。最后,使用-struct
选项保存 mat 文件,该选项告诉 MATLAB 将每个字段保存为单独的变量。所有这些都应该在您的程序保存巨大的 mat 文件时完成。现在,如果您需要访问,例如col52
,您所需要做的就是load('myDummyFile','col52')
。如果需要,您还可以加载多个。请记住,如果您有索引要求的顺序(即每行/每列),如果您需要访问矩阵中的任意索引,那么这将不起作用。创建单元格/结构体并保存它时可能会产生一些相关的开销。但如果您只保存一次但经常访问,这将是值得的。
如果您的矩阵很大(按照今天的标准,500x250000 并不是那么大),您必须注意这种方法的内存问题,因为我们将整个矩阵复制到一个单元格中结构。我一步一步地编写了它,以便更清楚地理解,但是您可以通过从虚拟创建一个单元格并将其分配给自身来减少重复,对于结构体也是如此。然而,这只减少了 1 份副本的数量,因为 Matlab 仍然需要将变量复制到内存中,以便在操作后分配给自己。
You can load specific variables from a
.mat
file which has several variables. However, I don't think you can load just a set of arbitrary indices from within a variable in MATLAB.That said, if your problem is of the type where you need to access only specific rows/columns, then I might have a workaround for you.
You can create a
struct
from the matrix, with each column as a separate field and then save the.mat
file with the-struct
option so that each field gets saved as a separate variable. That way, you can pull out the one you want.I'm not aware of a way to directly convert a matrix to a struct. So, you first split each column up into cells (the ordering is because you indicated that you need to access the columns. If you need the rows, you'll have to switch things around). This is in the cell
dummyCell
. Now to save to a struct, we need to generate field names. This is in the string cellfieldNames
. It generates field names of the formcol1
,col2
,etc... You can name it to something meaningful if you want. Then we convert thecell
to astruct
, by assigning each cell to the corresponding field name. Lastly the mat file is save with the-struct
option, which tells MATLAB to save each field as a separate variable. All of this should be done when your program is saving the giant mat file. Now if you need to access, saycol52
, all you need to do isload('myDummyFile','col52')
. You can also load more than one if you need to.Remember, this works well if you have an order to your indexing requirements (i.e., each row/each column) if you need to access arbitrary indices in the matrix, then this will not work. There might be some associated overhead while creating the cells/structs and saving it. But this will pay off if you're going to be saving just once, but accessing often.
If your matrix is huge (500x250000 isn't all that huge by today's standards), you'll have to watch out for memory issues with this approach, because we're duplicating the entire matrix into a cell & struct. I wrote it step by step so that it is clearer to understand, but you can reduce the duplication by creating a cell from
dummy
and assigning it to itself and similarly for the struct. However, this only reduces the number of copies by 1, as Matlab still has to copy a variable to memory to assign to itself after manipulation.