共享内存或 mmap - Linux C/C++工控机
上下文是进程间通信,其中一个进程(“服务器”)必须将固定大小的结构发送到在同一台计算机上运行的许多侦听进程(“客户端”)。
我在套接字编程中非常乐意这样做。 为了使服务器和客户端之间的通信更快并减少副本数量,我想尝试使用共享内存(shm)或mmap。
操作系统是RHEL 64位。
由于我是新手,请建议我应该使用哪个。 如果有人能给我推荐一本书或在线资源来学习同样的知识,我将不胜感激。
感谢您的回答。我想补充一点,服务器(市场数据服务器)通常会接收多播数据,这将导致它每秒向“客户端”“发送”大约 200,000 个结构,其中每个结构大约为 100 字节。 shm_open/mmap 实现是否仅在处理大数据块或大量小结构时优于套接字?
The context is Inter-Process-Communication where one process("Server") has to send fixed-size structs to many listening processes("Clients") running on the same machine.
I am very comfortable doing this in Socket Programming.
To make the communication between the Server and the Clients faster and to reduce the number of copies, I want to try out using Shared Memory(shm) or mmaps.
The OS is RHEL 64bit.
Since I am a newbie, please suggest which should I use.
I'd appreciate it if someone could point me to a book or online resource to learn the same.
Thanks for the answers. I wanted to add that the Server ( Market Data Server ) will typically be receiving multicast data, which will cause it to be "sending" about 200,000 structs per second to the "Clients", where each struct is roughly 100 Bytes.
Does shm_open/mmap implementation outperform sockets only for large blocks of data or a large volume of small structs as well ?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
我将使用
mmap
和shm_open
将共享内存映射到进程的虚拟地址空间。这是相对直接和干净的:具有某种象征意义的片段
名称,类似于
"/myRegion"
shm_open
您打开一个文件使用
ftruncate
将该区域的描述mmap
将其映射到您的地址空间
shmat
和 Co 接口(至少在历史上)有一个缺点,即它们可能对可映射的最大内存量有限制。然后,所有 POSIX 线程同步工具(
pthread_mutex_t
、pthread_cond_t
、sem_t
、pthread_rwlock_t
...)都有初始化接口也允许您在进程共享上下文中使用它们。所有现代 Linux 发行版都支持这一点。这是否比套接字更好?从性能角度来看,它可能会产生一些影响,因为您不必复制东西。但我想主要的一点是,一旦初始化了段,这在概念上就更简单了。要访问某个项目,您只需锁定共享锁,读取数据,然后再次解锁该锁。
正如 @R 所建议的,如果您有多个读取器,
pthread_rwlock_t
可能是最好使用的锁结构。I'd use
mmap
together withshm_open
to map shared memory into the virtual address space of the processes. This is relatively direct and clean:segment with some kind of symbolic
name, something like
"/myRegion"
shm_open
you open a filedescriptor on that region
ftruncate
you enlarge the segment to the size you needmmap
you map it into youraddress space
The
shmat
and Co interfaces have (at least historically) the disadvantage that they may have a restriction in the maximal amount of memory that you can map.Then, all the POSIX thread synchronization tools (
pthread_mutex_t
,pthread_cond_t
,sem_t
,pthread_rwlock_t
, ...) have initialization interfaces that allow you to use them in a process shared context, too. All modern Linux distributions support this.Whether or not this is preferable over sockets? Performance wise it could make a bit of a difference, since you don't have to copy things around. But the main point I guess would be that, once you have initialized your segment, this is conceptually a bit simpler. To access an item you'd just have to take a lock on a shared lock, read the data and then unlock the lock again.
As @R suggests, if you have multiple readers
pthread_rwlock_t
would probably the best lock structure to use.我曾经使用共享内存段实现了一个IPC库;这使我可以避免复制(我可以直接从发送方内存复制到接收方内存,而不是将数据从发送方内存复制到内核空间,然后从内核空间复制到接收方内存)。
无论如何,结果并不像我预期的那么好:实际上共享内存段是一个非常昂贵的过程,因为重新映射 TLB 条目和所有其他内容都非常昂贵。请参阅此邮件了解更多详细信息(我不是那些人中的一员,但是在开发我的库时收到这样的邮件)。
结果仅适用于真正大的消息(例如超过几兆字节),如果您使用小缓冲区,则 unix 套接字是您能找到的最优化的东西,除非您愿意编写内核模块。
I once implemented an IPC library using shared memory segments; this allowed me to avoid a copy (instead of copying data from sender memory, to kernel space, and then from kernel space to receiver memory, I could directly copying from sender to receiver memory).
Anyway results weren't as good as I was expecting: actually sharing a memory segment was a really expensive process, since remapping TLB entries and all the rest is quite expensive. See this mail for more details (I'm no one of those guys, but got into such mail while developing my library).
Results were good only for really big messages (say more than a few megabytes), if you're working with little buffers, unix sockets are the most optimized thing you can find unless you are willing to write a kernel module.
除了已经建议的之外,我想提供另一种方法:IPv6 节点/接口本地多播,即限制到环回接口的多播。
http://www .iana.org/assignments/ipv6-multicast-addresses/ipv6-multicast-addresses.xml#ipv6-multicast-addresses-1
乍一看,这可能看起来相当重量级,但大多数操作系统都以零方式实现环回套接字。复制架构。映射到传递给
send
的buf
参数的页面将被分配一个额外的映射并标记为写入时复制,以便如果发送程序覆盖其中的数据,或解除分配的内容将被保留。您应该使用健壮的数据结构,而不是传递原始结构。 Netstrings http://cr.yp.to/proto/netstrings.txt 和 BSON <我想到了 href="http://bsonspec.org/" rel="noreferrer">http://bsonspec.org/ 。
Apart from what's been suggested already, I'd like to offer another method: IPv6 Node/Interface Local Multicast, i.e. a multicast constrained to the loopback interface.
http://www.iana.org/assignments/ipv6-multicast-addresses/ipv6-multicast-addresses.xml#ipv6-multicast-addresses-1
At first this might seem quite heavyweight, but most OS implement loopback sockets in a zero-copy architecture. The page(s) mapped to the
buf
parameter passed tosend
will be assigned an additional mapping and marked as copy on write so that if the sending program overwrites the data therein, or deallocates the contents will be preserved.Instead of passing raw structs you should use a robust data structure. Netstrings http://cr.yp.to/proto/netstrings.txt and BSON http://bsonspec.org/ come to mind.
在 POSIX
shm_open/mmap
接口和较旧的 System Vshmop
接口之间进行选择不会产生太大差异,因为在初始化系统调用之后,您最终会得到相同的结果情况:各个进程之间共享的内存区域。如果您的系统支持它,我建议使用shm_open/mmap
,因为这是一个设计更好的界面。然后,您可以将共享内存区域用作公共黑板,所有进程都可以在其中书写其数据。困难的部分是同步访问该区域的进程。在这里,我建议避免炮制自己的同步方案,这可能非常困难且容易出错。相反,使用现有的基于套接字的工作实现来同步进程之间的访问,并仅使用共享内存在进程之间传输大量数据。即使使用此方案,您也需要一个中央进程来协调缓冲区的分配,因此只有当您有大量数据要传输时,此方案才值得。或者,使用同步库,例如 Boost.Interprocess。
Choosing between the POSIX
shm_open/mmap
interface and the older System Vshmop
one won't make a big difference, because after the initialization system calls, you end up with the same situation: a memory area that is shared between various processes. If your system supports it, I'd recommend to go withshm_open/mmap
, because this is a better designed interface.You then use the shared memory area as a common blackboard where all processes can scribble their data. The difficult part is to synchronize the processes accessing this area. Here I recommend to avoid concocting your own synchronization scheme, which can be fiendishly difficult and error-prone. Instead, use the existing working socket-based implementation for synchronizing access between processes, and use the shared memory only for transferring large amounts of data between processes. Even with this scheme you'll need a central process to coordinate the allocation of buffers, so this scheme is worth it only if you have very large volumes of data to transfer. Alternatively, use a synchronization library, like Boost.Interprocess.