如何在 MATLAB 中的进程之间共享内存?

发布于 2024-07-20 16:42:15 字数 187 浏览 8 评论 0原文

有没有办法在同一台计算机上的 MATLAB 进程之间共享内存?

我正在多核计算机上运行多个 MATLAB 进程(运行 Windows,如果有必要的话)。 它们都使用相同的巨大输入数据。 如果内存中只保留它的一个副本,那就太好了。

编辑:不幸的是,每个进程都需要访问整个巨大的输入数据,因此无法划分数据并解决问题。

Is there any way to share memory between MATLAB processes on the same computer?

I am running several MATLAB processes on a multi-core computer (running Windows, if it matters). They all use the same gigantic input data. It would be nice to only have a single copy of it in memory.

Edit: Unfortunately each process needs access to the whole gigantic input data, so there is no way to divide the data and conquer the problem.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

⊕婉儿 2024-07-27 16:42:15

如果进程只读取数据,但不修改它,那么我相信您可以将输入数据放入一个大文件中,并打开每个进程并从中读取数据那个文件。 每个进程都有自己的文件位置指示器,可以移动到文件中的任何位置以读取所需的数据。 我测试了两个 MATLAB 进程同时从一个文件读取一百万次左右,一切似乎都工作正常。 我只使用了基本的文件 I/O 命令(如下所列)。 看来您也可以使用 MEMMAPFILE 来执行此操作,正如福兹先生在他的回答中提到的(以及评论中的 SCFrench) ,假设您有 MATLAB R2008a 或更高版本。

以下是您可能为此使用的一些文件 I/O 命令:

  • FOPEN:每个进程都会调用 FOPEN 并返回一个文件标识符,它将在所有后续调用中使用。 您可以以二进制文本模式打开文件:

    fid = fopen('data.dat','r');   % 二进制模式 
      fid = fopen('data.txt','rt');   % 文本模式 
      
  • FREAD:在二进制模式下,FREAD将从文件中读取数据:

    A = fread(fid,20,'double');   % 读取20个双精度值 
      
  • FSCANF:在文本模式下,FSCANF将从文件中读取数据并格式化:

    A = fscanf(fid,'%d',4);   % 读取4个整数值 
      
  • FGETL/FGETS :在文本模式下,这些将从文件中读取整行。

  • FTELL:这会告诉您从文件开头算起的当前文件位置指示符(以字节为单位):

    <前><代码>ftell(fid)
    答案=
    8 % 位置指示符距文件开头 8 个字节

  • FSEEK:这会将文件位置指示器设置到文件中所需的位置:

    fseek(fid,0,-1);   % 将位置指示器移动到文件开头 
      
  • FCLOSE:每个进程都必须关闭对文件的访问(很容易忘记这样做):

    fclose(fid); 
      

该解决方案可能要求输入文件具有易于遍历的结构良好的格式(即只有一个大矩阵)。 如果它有很多可变长度字段,那么从文件中的正确位置读取数据可能会变得非常棘手。


如果流程还必须修改数据,这可能会变得更加困难。 一般来说,您不希望多个进程同时写入文件/内存位置,或者在另一个进程从同一位置读取时由一个进程写入,因为可能会导致不需要的行为。 在这种情况下,您必须限制对该文件的访问,以便一次只有一个进程对其进行操作。 其他进程必须等待第一个进程完成。 在这种情况下,每个进程必须运行的代码示例版本是:

processDone = false;
while ~processDone,
  if file_is_free(),  % A function to check that other processes are not
                      %   accessing the file
    fid = fopen(fileName,'r+');  % Open the file
    perform_process(fid);        % The computation this process has to do
    fclose(fid);                 % Close the file
    processDone = true;
  end
end

像这样的同步机制 ("")有时会产生很高的开销,从而降低代码的整体并行效率。

If the processes only ever read the data, but do not modify it, then I believe you can place your input data into one large file and have each process open and read from that file. Each process will have it's own file position indicator that it can move anywhere in the file to read the data it needs. I tested having two MATLAB processes reading simultaneously from a file a million or so times each and everything seemed to work fine. I only used basic file I/O commands (listed below). It appears you could also do this using MEMMAPFILE, as Mr Fooz mentioned in his answer (and SCFrench in a comment), assuming you have MATLAB version R2008a or newer.

Here are some of the file I/O commands that you will likely use for this:

  • FOPEN: Each process will call FOPEN and return a file identifier it will use in all subsequent calls. You can open a file in either binary or text mode:

    fid = fopen('data.dat','r');   % Binary mode
    fid = fopen('data.txt','rt');  % Text mode
    
  • FREAD: In binary mode, FREAD will read data from the file:

    A = fread(fid,20,'double');  % Reads 20 double-precision values
    
  • FSCANF: In text mode, FSCANF will read and format data from the file:

    A = fscanf(fid,'%d',4);  % Reads 4 integer values
    
  • FGETL/FGETS: In text mode, these will read whole lines from the file.

  • FTELL: This will tell you the current file position indicator in bytes from the beginning of the file:

    ftell(fid)
    ans =
         8    % The position indicator is 8 bytes from the file beginning
    
  • FSEEK: This will set the file position indicator to a desired position in the file:

    fseek(fid,0,-1);  % Moves the position indicator to the file beginning
    
  • FCLOSE: Each process will have to close its access to the file (it's easy to forget to do this):

    fclose(fid);
    

This solution will likely require that the input file has a well-structured format that is easy to traverse (i.e. just one large matrix). If it has lots of variable length fields then reading data from the correct position in the file could get very tricky.


If the processes have to also modify the data, this could get even more difficult. In general, you don't want a file/memory location being simultaneously written to by multiple processes, or written to by one process while another is reading from the same location, since unwanted behavior can result. In such a case, you would have to limit access to the file such that only one process at a time is operating on it. Other processes would have to wait until the first is done. A sample version of code that each process would have to run in such a case is:

processDone = false;
while ~processDone,
  if file_is_free(),  % A function to check that other processes are not
                      %   accessing the file
    fid = fopen(fileName,'r+');  % Open the file
    perform_process(fid);        % The computation this process has to do
    fclose(fid);                 % Close the file
    processDone = true;
  end
end

Synchronization mechanisms like these ("locks") can sometimes have a high overhead that reduces the overall parallel efficiency of the code.

木森分化 2024-07-27 16:42:15

您可能想查看我的 Matlab 文件交换提交“sharedmatrix”#28572。 它允许 Matlab 矩阵存在于共享内存中,前提是您使用某种 Unix 风格。 然后可以将共享矩阵附加到 parfor 或 spmd 的主体中,即,

shmkey=12345;
sharedmatrix('clone',shmkey,X);
clear X;
spmd(8)
    X=sharedmatrix('attach',shmkey);
    % do something with X
    sharedmatrix('detach',shmkey,X);
end
sharedmatrix('free',shmkey);

由于 X 存在于 spmd(或 parfor)主体的共享内存中,因此它没有加载时间,也没有通信时间。 从 Matlab 的角度来看,它是 spmd(或 parfor)主体中新创建的变量。

干杯,

乔什

http://www.mathworks.com/matlabcentral/fileexchange/28572-sharedmatrix< /a>

You may want to checkout my Matlab file-exchange submission "sharedmatrix" #28572. It allows a Matlab matrix to exist in shared memory, provided you are using some flavor of Unix. One could then attach the shared matrix in a body of a parfor or spmd, ie,

shmkey=12345;
sharedmatrix('clone',shmkey,X);
clear X;
spmd(8)
    X=sharedmatrix('attach',shmkey);
    % do something with X
    sharedmatrix('detach',shmkey,X);
end
sharedmatrix('free',shmkey);

Since X exists in shared memory for the body of the spmd (or parfor) it has no load time and no communication time. From the perspective of Matlab it is a newly created variable in the spmd (or parfor) body.

Cheers,

Josh

http://www.mathworks.com/matlabcentral/fileexchange/28572-sharedmatrix

童话里做英雄 2024-07-27 16:42:15

编辑:将数据放入原始文件中并使用memmapfile (感谢 SCFrench)。

==============================================

不,没有真正的这样做的方法。

我最重要的两个解决方案是:购买更多 RAM 或数据页面。

您可以做的最接近的事情是使用 mex 函数来分配共享内存,然后允许连续调用 mex 函数来提取较小的内存片段。 您不会希望将共享内存包装为 Matlab 数组(因为 Matlab 的内存模型无法很好地处理它)。

我本来建议查看 memmap,但显然它是有问题的

有时,您可以先运行一个 Matlab 程序来预处理数据或将数据分割成更小的块。 然后每个 Matlab 进程都可以在自己的较小块上运行。

这是关于在 Matlab 中处理大型数据集的教程

EDIT: Put the data in a raw file and use memmapfile (thanks SCFrench).

============================================

No, there is no real way of doing it.

My top two solutions have been: buy more RAM or page in the data.

The closest thing you could do would be to use a mex function to allocate shared memory, then allow successive calls to the mex function to extract out smaller slices of the memory. You wouldn't want to wrap the shared memory as a Matlab array (because Matlab's memory model wouldn't handle it well).

I was going to suggest looking into memmap, but apparently it's problematic.

Sometimes you can first run one Matlab program to pre-process or split up the data into smaller chunks. Then each of the Matlab processes can operate on its own smaller chunk.

Here's a tutorial on dealing with large datasets in Matlab.

安稳善良 2024-07-27 16:42:15

可能不会,至少不会像对待常规 MATLAB 变量那样对待数据。

如果在 Windows 计算机上,您可以创建 COM/ActiveX 包装器来访问共享数据。 MATLAB 允许通过 actxserver< 来使用 COM 对象/code>函数。 但是否真的可以通过不同的进程“直接”访问数据是值得怀疑的。 MATLAB 和 COM 之间存在某种编组层,数据会被转换,至少根据 在 MATLAB 和 COM 之间交换数据. 如果我绝对必须在 Windows 计算机上的进程之间共享结构化数据并进行快速访问,我可能会用 C++ 编写一些内容来通过 Boost::interprocess 并将对其的访问包装在 in 中- 处理 COM 服务器 (DLL)。 我以前做过一次。 尽管 Boost::interprocess 让它变得更容易,但它很痛苦。

Java 方法(因为 MATLAB 在 Java 之上运行)会更有前途,但据我所知,没有任何像样的 Java 库可以提供对共享内存的访问。 最接近的事情可能是通过 java.nio.MappedByteBuffer,但这确实是低级的。 不过,如果您的数据处于相对“方形”的形式(例如,大小均匀的数据的大 2-D 或 3-D 或 4-D 矩阵),那么这可能会起作用。

您可以尝试使用 HDF5 文件,MATLAB 内置了 HDF5 支持并且“相对”快。 但从我的经验来看,HDF5 似乎在并发方面表现得不太好。 (至少当一个进程正在写入而其他进程是读取器时不是这样。如果有多个读取器而没有写入器,则效果很好。)

Probably not, at least not in the way where you treat the data like a regular MATLAB variable.

If on a Windows machine, you could create a COM/ActiveX wrapper to access your shared data. MATLAB allows the use of COM objects through the actxserver function. But it's questionable whether you could actually access the data "directly" through different processes. There's some kind of marshaling layer between MATLAB and COM and data gets converted, at least according to the Mathworks docs on exchanging data between MATLAB and COM. If I absolutely had to share structured data between processes, with fast access, on a Windows machine, I'd probably write something in C++ to use shared memory via Boost::interprocess and wrap access to it in an in-process COM server (DLL). I've done this before, once. As much as Boost::interprocess makes it a lot easier, it's a pain.

The Java approach (since MATLAB runs on top of Java) would be much more promising, but as far as I know, there aren't any decent Java libraries to provide access to shared memory. The closest thing is probably to use a memory-mapped file via java.nio.MappedByteBuffer, but that's really low-level. Still, if your data is in a relatively "square" form (e.g. a big 2-D or 3-D or 4-D matrix of homogeneously-sized data) this might work OK.

You could try to use HDF5 files, MATLAB has built-in HDF5 support and it's "relatively" fast. But from my experience, HDF5 doesn't seem to play very well with concurrency. (at least not when one process is writing and the others are readers. If there are multiple readers and no writers, it works just fine.)

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文