经典文件系统问题——目录上的并发远程处理
我有一个应用程序,可以处理目录中的文件并将它们与处理后的输出一起移动到另一个目录。 没什么特别的。 引入了一个有趣的要求:
通过允许多个远程实例在同一文件存储上工作来实现容错和处理吞吐量。
额外的考虑因素是我们不能假设文件系统,因为我们同时支持 Windows 和 NFS。
当然,问题是,如何确保不同的实例不会尝试处理相同的工作,从而可能破坏工作或降低吞吐量? 文件锁定可能会出现问题,尤其是跨网络共享时。 我们可以使用更复杂的方法,例如简单的数据库或消息传递框架(例如 JMS 或类似的),但整个集群需要具有容错能力。 我们不能只有一个数据库或消息传递提供程序,因为它会引入单点故障。
我们已经实现了一个解决方案,该解决方案使用多播消息来自我发现处理实例并选择分配工作的主管。 如果主管宕机并进行另一次选举,则会出现超时。 然而,我们的网络库还不是很成熟,而且我们的消息实现也很笨拙。
然而,我的直觉告诉我有一种更简单的方法。
想法?
I have an application that processes files in a directory and moves them to another directory along with the processed output. Nothing special about that. An interesting requirement was introduced:
Implement fault tolerance and processing throughput by allowing multiple remote instances to work on the same file store.
Additional considerations are that we can not assume the file system, as we support both Windows and NFS.
Of course the problems is, how do I make sure that the different instances do not try and process the same work, potentially corrupting work or reducing throughput? File locking can be problematic, especially across network shares. We can use a more sophisticated method, such as a simple database or messaging framework, (a la JMS or similar), but the entire cluster needs to be fault tolerant. We can't have one database or messaging provider because of the single point of failure that it introduces.
We've implemented a solution that uses multicast messages to self-discover processing instances and elect a supervisor who assigns work. There's a timeout in case the supervisor goes down and another election takes place. Our networking library, however, isn't very mature and the our implementation of messages is clunky.
My instincts, however, tell me that there is a simpler way.
Thoughts?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我认为您可以放心地假设重命名操作在您关心的所有网络文件系统上都是原子的。 因此,如果您将大量工作安排为单个文件(或键入单个文件),则让每个服务器首先列出包含新工作的目录,选择一个工作,然后将其重命名为自己的文件服务器名称(例如,计算机名称或 IP 地址)。 对于同时执行相同操作的实例之一,重命名将成功,因此它们应该处理该工作。 对于其他人来说,它将失败,因此他们应该从获得的列表中选择不同的文件。
对于新工作的创建,假设目录创建 (mkdir) 是原子的,但文件创建不是原子的(对于文件创建,第二个写入者可能会覆盖现有文件)。 因此,如果也有多个作品制作者,请为每件作品创建一个新目录。
I think you can safely assume that rename operations are atomic on all network file systems that you care about. So if you arrange an amount of work to be a single file (or keyed to a single file), then have each server first list the directory containing new work, pick a piece of work, and then have it rename the file to its own server name (say, machine name or IP address). For one of the instances who concurrently perform the same operation, the rename will succeed, so they should then process the work. For the others, it will fail, so they should pick a different file from the listing they got.
For creation of new work, assume that directory creation (mkdir) is atomic, but file creation is not (for file creation, the second writer might overwrite the existing file). So if there are multiple producers of work also, create a new directory for each piece of work.