Linux下跨进程共享数据
在我的应用程序中,我有一个进程分叉一个子进程,例如 child1,并且该子进程在磁盘上写入一个巨大的二进制文件并退出。然后,父进程派生出另一个子进程 child2,该子进程读取这个巨大的文件以进行进一步的处理。
文件转储和重新加载使我的应用程序变慢,我正在考虑可能的 完全避免磁盘 I/O 的方法。我已经确定的可能方法是 ram-disk 或 tmpfs。 我可以以某种方式从我的应用程序中实现 ram-disk 或 tmpfs 吗?或者还有其他的吗 通过这种方式我可以完全避免磁盘 I/O 并可靠地跨进程发送数据。
In my application, I have a process which forks off a child, say child1, and this child process writes a huge binary file on the disk and exits. The parent process then forks off another child process, child2, which reads in this huge file to do further processing.
The file dumping and re-loading is making my application slow and I'm thinking of possible
ways of avoiding disk I/O completely. Possible ways I have identified are ram-disk or tmpfs.
Can I somehow implement ram-disk or tmpfs from within my application? Or is there any other
way by which I can avoid disk I/O completely and send data across processes reliably.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(7)
在分叉之前创建一个匿名共享内存区域,然后所有子进程都可以在分叉之后使用它:
请注意,共享内存时需要一些同步机制。实现此目的的一种方法是将互斥体或信号量放入共享内存区域。
Create an anonymous shared memory region before forking and then all children can use it after the fork:
Be aware that you'll need some synchronization mechanism when sharing memory. One way to accomplish this is to put a mutex or semaphore inside the shared memory region.
如果两个子进程不同时运行,管道或套接字将无法为您工作 - 它们的缓冲区对于“巨大的二进制文件”来说太小,并且第一个进程将阻塞等待任何读取数据的操作。
在这种情况下,您更需要某种共享内存。您可以使用 SysV IPC 共享内存 API、POSIX 共享内存 API(在最近的 Linux 上内部使用 tmpfs)或直接使用 tmpfs(通常挂载在 /dev/shm,有时挂载在 /tmp)文件系统上的文件。
If the two sub-processes do not run at the same time pipes or sockets won't work for you – their buffers would be too small for the 'huge binary file' and the first process will block waiting for anything for reading the data.
In such case you rather need some kind of shared memory. You can use the SysV IPC shared memory API, POSIX shared memory API (which internally uses tmpfs on recent Linux) or use files on a tmpfs (usually mounted on /dev/shm, sometimes on /tmp) file system directly.
命名管道正是您想要的。您可以像文件一样向其中写入数据并从中读取数据,但无需将其存储在磁盘上。
A named pipe is exactly what you want. You can write data into it and read data from it like it was a file, but there's no need to store it on disk.
您可以使用管道、套接字,并利用 Linux 内核的
sendfile()
或splice()
功能(它们可以避免数据复制)。You can use pipes, sockets, and take advantage of
sendfile()
orsplice()
features of Linux kernel (they can avoid data copying).您可以使用管道在进程之间传递数据。 这里是一个很好的概要和示例实现。
You can pass data between processes, using pipes. Here is a good synopsis and example implementation.
生成两个进程并让它们通过套接字传输数据。 TCP 将是最容易上手的,但如果您想要更高的效率,请使用 Unix 域套接字。这假设您不关心写入磁盘的数据本身。
Spawn the two processes and have them transfer the data via sockets. TCP will be easiest to get started, but if you want a bit more efficiency, use Unix Domain Sockets. This assumes you don't care about the data being written to disk per se.
正如在您的情况下,第一个子进程 child1 在 child2 存在之前退出,因此套接字通信或未命名管道将无济于事,
但共享内存可以完成这项工作:
在child1中创建一个对所有内容具有读权限的共享内存段,并在该共享内存中执行文件转储任务,
在child2中,将共享内存段附加到当前进程空间并读取转储的数据。
As in your case 1st child process child1 is exiting before child2 comes in existence so socket communication or un-named pipes will not help,
But shared memory will do the job:
Create a shared memory segment with read permission for all in child1 and do the file dumping task in that shared memory,
In child2 attach the shared memory segment to current process space and read the dumped data.