如何同步(使原子化)从两个进程写入一个文件?
我有两个进程,每个进程都写入大量数据缓冲区,并且我想控制同步这些进程对一个文件的写入。
进程1写入缓冲区A,包括(A1,A2,A3),进程2写入缓冲区B,包括(B1,B2,B3)。当我们使用 write() 系统调用将这些缓冲区写入磁盘到同一文件时(一次整个缓冲区:write(fd, A, sizeof(A))) ) , 文件架构如何?
- 是这样的:A,B 或 B,A 可能吗?
- 或者它可能是这样的:A1,A2,B1,A3,...
我问这个是因为系统调用是原子的。如果我们正在写入的数据缓冲区太大会发生什么。它像普通磁盘文件的管道吗?
I have two process each writing large buffer of data, and I want to control synchronize those processes' writes to one file.
process 1 writing buffer A including (A1, A2, A3) and process 2 writing buffer B including (B1, B2, B3). when we use write()
system call to write these buffers to disk to the same file(whole buffer at one time: write(fd, A, sizeof(A))
) , How is the file schema?
- Is it like this: A, B or B, A maybe?
- or it could be like this: A1, A2, B1, A3, ...
I'm asking this because system calls are atomic. what happens if the data buffer we are writing is too large. Is it like pipes for regular disk files?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
如果您希望两个缓冲区的内容都存在,则必须打开设置了
O_APPEND
标志的文件。追加标志在写入之前查找文件末尾。如果没有这个设置,两个进程可能会指向文件的相同或重叠区域,并且最后写入的人将覆盖另一个进程写入的内容。每次调用
write
都会写入请求的字节数。如果您的进程被信号中断,那么您可能会以部分写入结束——返回实际写入的字节数。无论是否写入所有字节,您都将写入文件的一个连续部分。您没有得到您提到的第二种可能性的交错效果(例如A1,B1,A2,B2,...)。如果您只得到部分写入,则如何进行取决于您。您可以继续写入(从缓冲区开始偏移先前写入的字节数),也可以放弃其余的写入。只有这样,才有可能获得交错的效果。
如果在另一个进程写入之前完成一个写入的内容很重要,那么您应该在尝试写入任何数据之前考虑锁定文件以进行独占写入访问(两个进程都必须检查)。
If you want the contents of both buffers to be present, you have to open the files with the
O_APPEND
flag set. The append flag seeks to the end of the file before writing. Without this set, it's possible that both processes will be pointing to the same or overlapping areas of the file and whoever writes last will overwrite what the other has written.Each call to
write
will write up to the number of bytes requested. If your process is interrupted by a signal, then you can end up with a partial write -- the actual number of bytes written is returned. Whether you get all of your bytes written or not, you'll have written one contiguous section of the file. You don't get the interleaving effect you mentioned as your second possibility (e.g. A1,B1,A2,B2,...).If you only get a partial write, how you proceed is up to you. You can either continue writing (offset from the buffer start by the number of bytes previously written), or you can abandon the rest of your write. Only in this way could you potentially get the interleaving effect.
If it's important to have the contents of one write complete before the other process writes, then you should look into locking the file for exclusive write access (which both processes will have to check for) before attempting to write any data.
假设缓冲区大小相等,则结果将是 A 或 B,具体取决于最后调度的进程。
是的,write 系统调用是原子的,这意味着结果将是 A 或 B,而不是两者的混合。
假设你想要文件中同时有A和B,你可以用O_APPEND打开文件;但请注意,这不适用于 NFS。
另一种选择是每个进程跟踪它应该使用的文件偏移量,并使用 lseek() 或 pwrite()
Assuming that the buffers are of equal size, the result will be either A or B, depending on which process was scheduled last.
The write system call is atomic, yes, meaning that the result will be either A or B, not a mixture of both.
Assuming that you want both A and B in the file, you can open the file with O_APPEND; note that this won't work over NFS, though.
Another option is that each process keeps track of which file offset it should use, and uses lseek() or pwrite()
您肯定需要对访问该文件的程序进行某种形式的同步,否则您最终会得到混乱的文件内容。
write
系统调用写入的字节数可能少于您请求的字节数,因此您的块 A1、A2 或 B1、B2 可能仅被部分写入。这种情况可能经常发生,也可能很少发生,具体取决于许多条件。如果这种情况每周只发生一次,那么您将遇到一个很难检测到的错误。作为一种解决方案,您可以使用文件锁定(
man 2flock
或man fcntl
并搜索锁定)。另一种可能性是使用信号量(man -k semaphore
)来同步程序写入,或使用其他形式的 IPC。You definitely need some form of synchronization for your programs that access the file, or you end up with messed up file contents. The
write
system call may write less bytes than you requested, so your blocks A1, A2 or B1, B2 may only be written partially. This might happen often, or rarely, depending on many conditions. If it only happens once in a week, you will have a bug that may be very hard to detect.As a solution, you can use file locking (
man 2 flock
orman fcntl
and search for locking). Another possibility is to use semaphores (man -k semaphore
) to synchronize your programs writes, or some other forms of IPC.