MATLAB 中的信号量和锁
我正在开发一个 MATLAB 项目,希望有两个 MATLAB 实例并行运行并共享数据。我将这些实例称为 MAT_1
和 MAT_2
。更具体地说,系统的架构是:
MAT_1
顺序处理图像,使用imread
一张一张地读取它们,并使用imwrite<输出每张图像的结果。 /代码>。
MAT_2
使用imread
读取MAT_1
输出的图像,并将其结果输出到其他位置。
我认为我需要解决的问题之一是保证 MAT_2
在 MAT_1
完全完成写入后读取 MAT_1
输出的图像。
我的问题是:
- 你将如何解决这个问题?我是否需要使用信号量或锁来防止竞争条件?
- MATLAB 是否提供任何锁定文件的机制? (即类似于
flock
的内容,但提供直接由 MATLAB 编写,并且可以在多个平台上运行,例如 Windows 和 Linux)。如果没有,您知道我可以使用任何第三方库在 MATLAB 中构建此机制吗?
编辑:
- 正如 @yoda 在下面指出的,并行计算工具箱 (PCT) 允许阻止 MATLAB 工作程序之间的调用,这非常棒。也就是说,我对不需要 PCT 的解决方案特别感兴趣。
为什么需要
MAT_1
和MAT_2
在并行线程中运行?:MAT_2
中完成的处理平均比MAT_1
慢(并且更容易崩溃),并且MAT_1
的输出提供给其他不需要等待MAT_2
完成其工作的程序和进程(包括人工检查)。
答案:
- 对于允许实现信号量但不依赖于 PCT 的解决方案,请参阅下面乔纳斯的答案
- 对于解决该问题的其他好方法,请参阅下面尤达的答案
I am working on a MATLAB project where I would like to have two instances of MATLAB running in parallel and sharing data. I will call these instances MAT_1
and MAT_2
. More specifically, the architecture of the system is:
MAT_1
processes images sequentially, reading them one by one usingimread
, and outputs the result for each image usingimwrite
.MAT_2
reads the images output byMAT_1
usingimread
and outputs its result somewhere else.
One of the problems I think I need to address is to guarantee that MAT_2
reads an image output by MAT_1
once MAT_1
has fully finished writing to it.
My questions are:
- How would you approach this problem? Do I need to use semaphores or locks to prevent race conditions?
- Does MATLAB provide any mechanism to lock files? (i.e. something similar to
flock
, but provided by MATLAB directly, and that works on multiple platforms, e.g. Windows & Linux). If not, do you know of any third-party library that I can use to build this mechanism in MATLAB?
EDIT :
- As @yoda points out below, the Parallel Computing Toolbox (PCT) allows for blocking calls between MATLAB workers, which is great. That said, I am particularly interested in solutions that do not require the PCT.
Why do I require
MAT_1
andMAT_2
to run in parallel threads?:The processing done in
MAT_2
is slower on average (and more prone to crashing) thanMAT_1
, and the output ofMAT_1
feeds other programs and processes (including human inspection) that do not need to wait forMAT_2
to do its job.
Answers :
- For a solution that allows for the implementation of semaphores but does not rely on the PCT see Jonas' answer below
- For other good approaches to the problem, see Yoda's answer below
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
我会使用信号量来解决这个问题;根据我的经验,PCT 的同步速度慢得不合理。
dfacto(另一个答案)有一个很好的 MATLAB 信号量实现,但它不适用于 MS Windows;我改进了那项工作,这样它就可以了。改进的工作在这里:http://www.mathworks.com/matlabcentral/fileexchange/45504 -semaphoreposixandwindows
这比与 Java、.NET、PCT 或文件锁交互的性能更好。这不使用并行计算工具箱(PCT),并且 AFAIK 信号量功能无论如何也不在 PCT 中(令人费解的是他们遗漏了它!)。可以使用 PCT 进行同步,但我尝试过的所有操作都慢得不合理。
要将此高性能信号量库安装到 MATLAB 中,请在 MATLAB 解释器中运行以下命令:
mex -O -v semaphore.c
您需要安装 C++ 编译器才能将 semaphore.c 编译为二进制 MEX 文件。然后可以从 MATLAB 代码调用该 MEX 文件,如下例所示。
使用示例:
I would approach this using semaphores; in my experience the PCT is unreasonably slow at synchronization.
dfacto (another answer) has a great implementation of semaphores for MATLAB, however it will not work on MS Windows; I improved on that work so that it would. The improved work is here: http://www.mathworks.com/matlabcentral/fileexchange/45504-semaphoreposixandwindows
This will be better performing than interfacing with Java, .NET, the PCT, or file locks. This does not use the Parallel Computing Toolbox (PCT), and AFAIK semaphore functionality isn't in the PCT anyway (puzzling that they left it out!). It is possible to use the PCT for synchronization but everything I'd tried in it was unreasonably slow.
To install this high-performance semaphore library into MATLAB, run this within the MATLAB interpreter:
mex -O -v semaphore.c
You'll need a C++ compiler installed to compile semaphore.c into a binary MEX-file. That MEX-file is then callable from your MATLAB code as shown in the example below.
Usage example:
就我个人而言,我会为此使用并行处理工具箱。
据我所知,Matlab 中没有直接的方法来获得系统范围的文件锁。但是,为了确保 Matlab #2 在文件写入完成后仅读取 Matlab #1 的输出,我建议在写入文件
results_1.mat
后,Matlab #1 写入第二个文件,results_1.finished
,这是一个空文本文件。由于第二个文件是在第一个文件之后写入的,因此它的存在表明结果文件已被写入。因此,您可以搜索扩展名为finished
的文件,即dir('*.finished')
,并使用fileparts
获取文件名您想要使用 Matlab #2 加载的 .mat 文件。Personally, I'd use the parallel processing toolbox for this.
As far as I know, there is no straightforward way in Matlab to have systemwide file locks. However, in order to ensure that Matlab #2 only reads output of Matlab #1 when the file has finished writing, I suggest that after writing e.g the file
results_1.mat
, Matlab #1 writes a second file,results_1.finished
, which is an empty text file. Since the second file is written after the first, its existence signals that the results-file has been written. You can thus search for files with the extensionfinished
, i.e.dir('*.finished')
, and usefileparts
to get the name of the .mat file you'd like to load with Matlab #2.我不确定您是否正在寻找仅限 matlab 的解决方案,但我刚刚提交了一个用于 Matlab 的信号量包装器。它作为通用信号量工作,但它主要是根据 sharedmatrix 设计的。
Mathworks 接受提交后,我将更新我的研究小组 博客。
请注意,此 mex 文件是 POSIX 信号量功能的包装器。因此,它可以在 Linux、Unix、MacOS 上运行,但不能在 Windows 上开箱即用。当针对 cygwin 库编译时它可能会起作用。
I am not sure if you are looking for matlab-only solution but I have just submitted a semaphore wrapper for use in Matlab. It works as a generic semaphore, but it was mainly designed with sharedmatrix in mind.
As soon as Mathworks accepts the submission, I will update the link on my research group's blog.
Please note that this mex file is a wrapper for the POSIX semaphore functionality. As such it will work in Linux, Unix, MacOS but will not work out-of-the-box on Windows. It may work when compiled against cygwin libraries.
我认为除了使用操作系统特定的锁之外,没有其他万无一失的方法。一种方法可能是让 MAT_1 执行以下操作:
并让 MAT_2 仅处理completedFileName。
I dont think there is a fool-proof way other than using the OS specific locks. One approach might be to have MAT_1 do:
And have MAT_2 only process completedFileName.
编辑:
看到您的编辑后,不涉及使用任何工具箱的简单解决方案如下:
由于
MAT_2
比MAT_1
慢得多,因此启动MAT_2
代码> 有延迟。即,当 MAT_1 完成处理(例如 5 个图像左右)时启动它。如果您这样做,MAT_2
将永远不会赶上MAT_1
,因此永远不会处于必须“等待”来自MAT_1
的图像的情况代码>.我仍然不清楚您的问题中的一些事情:
MAT_1
按顺序处理图像,但是是否必须?换句话说,它们的处理顺序重要吗?MAT_2
读取MAT_1
的输出...它必须按照MAT_1
完成的顺序还是可以是任何顺序?MAT_2
使用imread
读取图像并将其输出到其他位置。是否有任何原因导致任务无法合并到MAT_1
中?无论如何,您都可以使用并行计算工具箱来实现某种形式的执行阻塞;但您必须创建一个分布式作业(示例)。
需要注意的重要一点是,每个工作人员(实验室)都有一个
labindex
,您可以使用labSend
将数据从工作人员 1(相当于MAT_1
)发送到工作人员 2(相当于MAT_2
),工作人员 2然后使用labReceive
。来自labReceive
的文档:这几乎正是您想要对
MAT_1
和MAT_2
执行的操作。另一种方法是在当前会话中生成一个额外的工作线程,但仅将
MAT_1
执行的任务分配给它。然后设置FinishedFcn
任务的属性来执行由MAT_2
执行的一组函数,但我不推荐它,因为我认为这不是FinishedFcn
的意图,而且我不知道在某些情况下是否会崩溃。EDIT:
After seeing your edit, a simple solution not involving the use of any toolboxes is the following:
Since
MAT_2
is much slower thanMAT_1
, startMAT_2
with a delay. i.e., start it whenMAT_1
has finished processing say 5 images or so. If you do this,MAT_2
will never catch up withMAT_1
and hence will never be in a situation where it has to "wait" for images fromMAT_1
.I'm still not clear on a few things from your question:
MAT_1
processes images sequentially, but does it have to? In other words, does the order in which they are processed matter?MAT_2
reads the output fromMAT_1
... Does it have to be in the order thatMAT_1
finishes or can that be any order?MAT_2
reads the image usingimread
and outputs it some where else. Is there any reason that task cannot be combined intoMAT_1
?In any case, you can implement some form of execution blocking using the parallel computing toolbox; but instead of using
parfor
loops (which is what most people use), you'll have to create a distributed job (example).The important thing to note is that each worker (lab) has a
labindex
, and you can uselabSend
to send data from worker 1 (equivalent ofMAT_1
) to worker 2 (equivalent ofMAT_2
), who then receives it usinglabReceive
. From the documentation onlabReceive
:which is pretty much what you wanted to do with
MAT_1
andMAT_2
.Another way to do this would be to spawn one additional worker in your current session, but only assign tasks performed by
MAT_1
to it. You then set theFinishedFcn
property for the tasks to execute the set of functions performed byMAT_2
, but I wouldn't recommend it as I don't think this was the intent forFinishedFcn
, and I don't know if it will break in certain cases.我还建议查看并行处理工具箱来解决这样的问题,您想要的功能应该在其中的某个地方。我认为这种方式比尝试同步 MATLAB 的两个实例更干净(除非您被迫使用两个实例)。
在没有这样的事情的奇怪情况下,您还可能会考虑不同的环境来实现您想要的。这可能是一种解决方法,但您始终可以将 MATLAB 代码与其他语言(例如 Java、.NET、C 等)连接并使用您习惯的功能。使用 Java,您可以确信您的解决方案是独立于平台的,.NET 只能在 Windows 上运行(至少与 MATLAB 结合使用)。
I would also recommend to look at the parallel processing toolbox for such a thing, the functionality you want should be in there somewhere. I think it's cleaner that way than trying to synchronize two instances of MATLAB (unless you are forced to use two instances).
In the odd case that there is no such thing, you might also look at different environments to implement what you want. It might be a bit of a workaround, but you can always interface your MATLAB code with other languages (e.g. Java, .NET, C, ...) and use the functionality you are accustomed to there. With Java you are quite sure that your solution is platform independent, .NET only works on Windows (at least in combination with MATLAB).