多线程共享的文件系统
情况是这样的:
I am talking about general linux concurrent programming environment
Definition:
Node: a machine with a processor.
file system: can be accessed both locally and remotely.
it includes large set of files varied in random size.
Node I: Process A with multithreads access A's file system, operations include
read and write. Process B, similar to A. Think about more similar processes
C,D,etc.
Then, thinking about scaling. The same FS system is located on a separate node.
Operated by processes E,F,G etc on node II, and processes A,B,C,D on Node I.
thinking about similar node III,IV,V, etc.
这既是一个实用问题,也是一个面试问题。这是我的解决方案:
I can use mutex and signal resolve multi reader and writer of the same
file within a process. And also using IPC resolve multiprocesses
communication and synchronization.
the code could work very well for single node multiprocesses.
But, when dealing with multi node. We need similar but more
complicated mechanism to detect are there any node
writting on the FS, if yes, wait; otherwise, access
writting mutex and write, then notify waiting guys.
经过更多思考,我的想法如下:
From the point of a NFS, we define file lock of course based on file.
My target is:
at each moment,there is only one writer write the file,
there can be more than one reader read the file.
Then, all the processes on different nodes are the same.
they should have their own mechanism to acquire either read or write lock,
of course, dealing with connection, failures and retries.
我想知道此类问题是否有一些原型?
here is the case:
I am talking about general linux concurrent programming environment
Definition:
Node: a machine with a processor.
file system: can be accessed both locally and remotely.
it includes large set of files varied in random size.
Node I: Process A with multithreads access A's file system, operations include
read and write. Process B, similar to A. Think about more similar processes
C,D,etc.
Then, thinking about scaling. The same FS system is located on a separate node.
Operated by processes E,F,G etc on node II, and processes A,B,C,D on Node I.
thinking about similar node III,IV,V, etc.
This is both a practical and interview question. Here is my solution:
I can use mutex and signal resolve multi reader and writer of the same
file within a process. And also using IPC resolve multiprocesses
communication and synchronization.
the code could work very well for single node multiprocesses.
But, when dealing with multi node. We need similar but more
complicated mechanism to detect are there any node
writting on the FS, if yes, wait; otherwise, access
writting mutex and write, then notify waiting guys.
After more thinking, follows is my idea:
From the point of a NFS, we define file lock of course based on file.
My target is:
at each moment,there is only one writer write the file,
there can be more than one reader read the file.
Then, all the processes on different nodes are the same.
they should have their own mechanism to acquire either read or write lock,
of course, dealing with connection, failures and retries.
I am wondering if there is some prototype for such kind of problem?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我假设“节点”意味着“网络节点”,即运行自己的操作系统副本的实体。它可以是真实的机器或虚拟机。
老实说,这个问题的措辞非常糟糕。不太清楚所问的是什么;我可以假设他们没有询问节点间同步或锁定。
所以你对第一部分很擅长:线程之间的互斥,同一台机器上进程之间的 IPC 信号量。
如果要处理不同节点之间的交互,首先需要有一个网络文件系统,例如 NFS 或 CIFS。其次,您需要文件锁(或锁定文件)来管理对共享文件的访问。文件锁也可以在其他级别(线程间和进程间)使用,尽管它们不像互斥锁和信号量那么简单。
您还可以从套接字构建一个同步系统,但这需要每个节点都有一个到其他节点的套接字,这意味着 N^2 个套接字可能存在竞争条件,或者是一个中央交换所节点,这会成为单点故障。
I assume "node" means "network node", i.e. an entity running its own copy of the operating system. It could be an actual machine or a virtual machine.
The question is pretty badly worded, honestly. It's not really clear what is being asked; I could assume that they're not asking about inter-node synchronization or locking.
So you're good on the first part: mutexes between threads, IPC semaphores between processes on the same machine.
If you want to handle interactions between separate nodes, first you need to have a networked filesystem, such as NFS or CIFS. Second, you need file locks (or lock files) to manage access to shared files. File locks can also be used at the other levels, inter-thread and inter-process, though they're not as straightforward as mutexes and semaphores.
You could also build up a synchronization system from sockets, but that requires each node to have a socket to each other node, which means N^2 sockets with possible race conditions, or a central clearinghouse node, which becomes a single point of failure.