与平台无关的文件锁定?
我正在进行一项计算量非常大的科学工作,时不时地会得出结果。 这项工作基本上就是多次模拟同一件事,因此它被分配到使用不同操作系统的多台计算机上。 我想将所有这些实例的输出定向到同一个文件,因为所有计算机都可以通过 NFS/Samba 查看相同的文件系统。 以下是限制:
- 必须允许安全并发追加。 如果另一台计算机上的某个其他实例当前正在附加到该文件,则必须阻止。
- 性能不重要。 每个实例的 I/O 每分钟仅几个字节。
- 简单确实很重要。 这样做的全部意义(除了纯粹的好奇心之外)是这样我就可以停止让每个实例写入不同的文件并手动将这些文件合并在一起。
- 不得依赖于文件系统的详细信息。 必须使用 NFS 或 Samba 装载上的未知文件系统。
我使用的语言是 D,以防万一。 我看过,标准库中似乎没有任何东西可以做到这一点。 D 特定的和一般的、与语言无关的答案都是完全可以接受和赞赏的。
I'm running a very computationally intensive scientific job that spits out results every now and then. The job is basically to just simulate the same thing a whole bunch of times, so it's divided among several computers, which use different OSes. I'd like to direct the output from all these instances to the same file, since all the computers can see the same filesystem via NFS/Samba. Here are the constraints:
- Must allow safe concurrent appends. Must block if some other instance on another computer is currently appending to the file.
- Performance does not count. I/O for each instance is only a few bytes per minute.
- Simplicity does count. The whole point of this (besides pure curiosity) is so I can stop having every instance write to a different file and manually merging these files together.
- Must not depend on the details of the filesystem. Must work with an unknown filesystem on an NFS or Samba mount.
The language I'm using is D, in case that matters. I've looked, there's nothing in the standard lib that seems to do this. Both D-specific and general, language-agnostic answers are fully acceptable and appreciated.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
通过 NFS,您会面临客户端缓存和陈旧数据的一些问题。 我之前写过一个独立于操作系统的锁模块来通过 NFS 工作。 创建 [datafile].lock 文件的简单想法在 NFS 上效果不佳。 解决这个问题的基本思想是创建一个锁定文件 [datafile].lock,如果存在,则意味着文件未锁定,并且想要获取锁定的进程将文件重命名为不同的名称,例如 [datafile].lock.[主机名].[pid]。 重命名是一个足够原子的操作,它在 NFS 上运行得很好,可以保证锁的独占性。 其余的基本上是一堆故障安全、循环、错误检查和锁检索,以防进程在释放锁并将锁文件重命名回 [datafile].lock 之前终止
Over NFS you face some problems with client side caching and stale data. I have written an OS independent lock module to work over NFS before. The simple idea of creating a [datafile].lock file does not work well over NFS. The basic idea to work around it is to create a lock file [datafile].lock which if present means file is NOT locked and a process that wants to acquire a lock renames the file to a different name like [datafile].lock.[hostname].[pid]. The rename is an atomic enough operation that works well enough over NFS to guarantee exclusivity of the lock. The rest is basically a bunch of fail safe, loops, error checking and lock retrieval in case the process dies before releasing the lock and renaming the lock file back to [datafile].lock
经典的解决方案是使用锁定文件,或更准确地说是锁定目录。 在所有常见的操作系统上,创建目录都是原子操作,因此例程是:
这已经被诸如CVS之类的应用程序在许多平台上使用了很多年。 唯一的问题发生在极少数情况下,即您的应用程序在写入时和移除锁之前崩溃。
The classic solution is to use a lock file, or more accurately a lock directory. On all common OSs creating a directory is an atomic operation so the routine is:
This has been used by applications such as CVS for many years across many platforms. The only problem occurs in the rare cases when your app crashes while writing and before removing the lock.
为什么不直接构建一个位于文件和其他计算机之间的简单服务器呢?
那么如果你想改变数据格式,你只需要修改服务器,而不是所有的客户端。
在我看来,构建服务器比尝试使用网络文件系统要容易得多。
Why not just build a simple server which sits between the file and the other computers?
Then if you ever wanted to change the data format, you would only have to modify the server, and not all of the clients.
In my opinion building a server would be much easier than trying to use a Network file system.
用扭曲锁定文件
就像其他答案提到的那样,最简单的方法是在与数据文件相同的目录中创建一个锁定文件。
由于您希望能够通过多台 PC 访问同一文件,因此我能想到的最佳解决方案是仅包含当前写入数据文件的计算机的标识符。
因此,写入数据文件的顺序为:
检查是否存在锁定文件
如果存在锁定文件,请查看是否存在通过检查其内容是否具有我的标识符来拥有它。
如果是这种情况,只需写入数据文件,然后删除锁定文件即可。
如果情况并非如此,只需等待一秒钟或一段随机的时间,然后再次尝试整个循环。
如果没有锁定文件,请使用我的标识符创建一个,然后再次尝试整个循环以避免竞争条件(重新检查锁定文件是否确实是我的)。
除了标识符之外,我还会在锁定文件中记录时间戳,并检查它是否早于给定的超时值。
如果时间戳太旧,则假设锁定文件已过时并删除它,因为这意味着写入数据文件的 PC 之一可能已崩溃或其连接可能已丢失。
另一种解决方案
如果您可以控制数据文件的格式,可以在文件的开头保留一个结构来记录它是否被锁定。
例如,如果您只是为此目的保留一个字节,则可以假设
00
表示数据文件未锁定,而其他值表示当前写入的机器的标识符它。NFS 问题
好的,我添加了一些内容,因为 Jiri Klouda 正确指出 NFS 使用客户端缓存,这将导致实际的锁定文件处于不确定状态。
解决此问题的几种方法:
使用
noac
或sync
选项挂载 NFS 目录。 这很简单,但并不能完全保证客户端和服务器之间的数据一致性,因此可能仍然存在问题,尽管在您的情况下可能没问题。使用
O_DIRECT
、O_SYNC
或O_DSYNC
属性打开锁定文件或数据文件。 这应该完全禁用缓存。这会降低性能,但会确保一致性。
您也许能够使用
flock()
来锁定数据文件,但其实现参差不齐,您需要检查您的特定操作系统是否确实使用了NFS锁定服务。 否则它可能什么也不做。如果数据文件被锁定,则另一个客户端打开它进行写入将失败。
哦,是的,它似乎不适用于 SMB 共享,因此最好忘记它。
不要使用 NFS,而只使用 Samba:有一个好的方法关于该主题的文章以及为什么 NFS 可能不是您的使用场景的最佳答案。
您还将在本文中找到锁定文件的各种方法。
Jiri 的解决方案也是一个不错的解决方案。
基本上,如果您想让事情变得简单,请不要将 NFS 用于在多台计算机之间共享的频繁更新的文件。
有所不同
使用小型数据库服务器将数据保存到并完全绕过 NFS/SMB 锁定问题,或者保留当前的多个数据文件系统并仅编写一个小型实用程序来连接结果。
它可能仍然是解决您问题的最安全、最简单的解决方案。
Lock File with a twist
Like other answers have mentioned, the easiest method is to create a lock file in the same directory as the datafile.
Since you want to be able to access the same file over multiple PC the best solution I can think of is to just include the identifier of the machine currently writing to the data file.
So the sequence for writing to the data file would be:
Check if there is a lock file present
If there is a lock file, see if I'm the one owning it by checking that its content has my identifier.
If that's the case, just write to the data file then delete the lock file.
If that's not the case, just wait a second or a small random length of time and try the whole cycle again.
If there is no lock file, create one with my identifier and try the whole cycle again to avoid race condition (re-check that the lock file is really mine).
Along with the identifier, I would record a timestamp in the lock file and check whether it's older than a given timeout value.
If the timestamp is too old, then assume that the lock file is stale and just delete it as it would mea one of the PC writing to the data file may have crashed or its connection may have been lost.
Another solution
If you are in control the format of the data file, could be to reserve a structure at the beginning of the file to record whether it is locked or not.
If you just reserve a byte for this purpose, you could assume, for instance, that
00
would mean the data file isn't locked, and that other values would represent the identifier of the machine currently writing to it.Issues with NFS
OK, I'm adding a few things because Jiri Klouda correctly pointed out that NFS uses client-side caching that will result in the actual lock file being in an undetermined state.
A few ways to solve this issue:
mount the NFS directory with the
noac
orsync
options. This is easy but doesn't completely guarantee data consistency between client and server though so there may still be issues although in your case it may be OK.Open the lock file or data file using the
O_DIRECT
, theO_SYNC
orO_DSYNC
attributes. This is supposed to disable caching altogether.This will lower performance but will ensure consistency.
You may be able to use
flock()
to lock the data file but its implementation is spotty and you will need to check if your particular OS actually uses the NFS locking service. It may do nothing at all otherwise.If the data file is locked, then another client opening it for writing will fail.
Oh yeah, and it doesn't seem to work on SMB shares, so it's probably best to just forget about it.
Don't use NFS and just use Samba instead: there is a good article on the subject and why NFS is probably not the best answer to your usage scenario.
You will also find in this article various methods for locking files.
Jiri's solution is also a good one.
Basically, if you want to keep things simple, don't use NFS for frequently-updated files that are shared amongst multiple machines.
Something different
Use a small database server to save your data into and bypass the NFS/SMB locking issues altogether or keep your current multiple data files system and just write a small utility to concatenate the results.
It may still be the safest and simplest solution to your problem.
我不知道 D,但我认为使用互斥文件来完成这项工作可能会起作用。 以下是一些您可能会觉得有用的伪代码:
因此,所有进程都会尝试创建互斥文件,但只有获胜的进程才能继续。 写入输出后,关闭并删除互斥锁,以便其他进程可以执行相同的操作。
I don't know D, but I thing using a mutex file to do the jobe might work. Here's some pseudo-code you might find useful:
So, all processes will try to create the mutex file but only the one who wins will be able to continue. Once you write your output, close and delete the mutex so other processes can do the same.