子进程的内存优化
我在用于电缆调制解调器的 ARM 处理器的 Linux 上工作。我编写了一个工具,可以使用原始套接字发送/风暴定制的 UDP 数据包。我从头开始构建数据包,以便我们可以灵活地使用不同的选项。该工具主要用于对路由器进行压力测试。
我实际上创建了多个界面。每个接口都将使用 DHCP 获取 IP 地址。这样做是为了使调制解调器充当虚拟客户端设备 (vcpe)。
当系统启动时,我启动那些被要求的进程。我启动的每个进程都会不断发送数据包。因此进程 0 将使用接口 0 发送数据包,依此类推。每个发送数据包的进程都允许配置(在运行时更改 UDP 参数和其他选项)。这就是我决定采用单独流程的原因。
我使用调制解调器的配置过程中的 fork 和 excec 来启动这些过程。
现在的问题是每个进程都占用大量内存。仅启动 3 个这样的进程就会导致系统崩溃并重新启动。
我尝试过以下操作:
我一直认为将更多代码推送到共享库会有所帮助。因此,当我尝试将许多函数移至共享库并在进程中保留最少的代码时,令我惊讶的是,这并没有什么区别。我还删除了所有数组并让它们使用堆。然而这并没有什么区别。这可能是因为进程连续运行,并且无论是堆栈还是堆都没有区别?我怀疑我称之为分叉的过程是巨大的,这就是我使结果变得巨大的过程的原因。我不知道我还能做什么。说进程A很大->我通过 fork 和 exec 启动进程 B。 B继承了A的内存区域。所以现在我这样做 -> A 启动 C,进而启动 B 也无济于事,因为 C 仍然继承 A?我使用 vfork 作为替代方案,但也没有帮助。我确实想知道为什么。
如果有人给我一些提示来帮助我减少每个独立子进程使用的内存,我将不胜感激。
I work on Linux for ARM processor for cable modem. There is a tool that I have written that sends/storms customized UDP packets using raw sockets. I form the packet from scratch so that we have the flexibility to play with different options. This tool is mainly for stress testing routers.
I actually have multiple interfaces created. Each interface will obtain IP addresses using DHCP. This is done in order to make the modem behave as virtual customer premises equipment (vcpe).
When the system comes up, I start those processes that are asked to. Every process that I start will continuously send packets. So process 0 will send packets using interface 0 and so on. Each of these processes that send packets would allow configuration (change in UDP parameters and other options at run time). Thats the reason I decide to have separate processes.
I start these processes using fork and excec from the provisioning processes of the modem.
The problem now is that each process takes up a lot of memory. Starting just 3 such processes, causes the system to crash and reboot.
I have tried the following:
I have always assumed that pushing more code to the Shared Libraries will help. So when I tried moving many functions into shared library and keeping minimum code in the processes, it made no difference to my surprise. I also removed all arrays and made them use the heap. However it made no difference. This maybe because the processes runs continuously and it makes no difference if it is stack or heap? I suspect the process from I where I call the fork is huge and that is the reason for the processes that I make result being huge. I am not sure how else I could go about. say process A is huge -> I start process B by forking and excec. B inherits A's memory area. So now I do this -> A starts C which inturn starts B will also not help as C still inherits A?. I used vfork as an alternative which did not help either. I do wonder why.
I would appreciate if someone give me tips to help me reduce the memory used by each independent child processes.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
鉴于这是一个测试工具,那么最有效的做法就是为测试机器添加更多内存。
如果失败:
Given this is a test tool, then the most efficient thing to do is to add more memory to the testing machine.
Failing that:
不是从技术上回答你的问题,而是提供一些替代解决方案:
如果你使用 Linux,你是否考虑过使用 pktgen?它是一个灵活的工具,可以以接口允许的速度从内核发送 UDP 数据包。这比用户空间工具快得多。
哦,还有一个无耻的插头。我制作了一个多线程网络测试工具,可用于向具有 UDP 数据包的网络。它可以在多进程模式(通过使用fork)或多线程模式(通过使用pthreads)下运行。 pthreads 可能使用较少的 RAM,因此可能更适合您使用。如果有的话,可能值得查看源代码,因为我花了很多年改进这段代码,并且它能够生成足够的数据包来饱和 10gbps 接口。
Not technically answering your question, but providing a couple of alternative solutions:
If you are using Linux have you considered using pktgen? It is a flexible tool for sending UDP packets from kernel as fast as the interface allows. This is much faster than a userspace tool.
oh and a shameless plug. I have made a multi-threaded network testing tool, which could be used to spam the network with UDP packets. It can operate in multi-process mode (by using fork), or multi-thread mode (by using pthreads). The pthreads might use less RAM, so might be better for you to use. If anything it might be worth looking at the source as I've spent many years improving this code, and its been able to generate enough packets to saturate a 10gbps interface.
可能发生的情况是进程 A 中的 fork 调用需要大量 RAM + 交换(如果有)。因此,当您从此进程调用 fork() 时,内核必须为子进程保留足够的 RAM 和交换空间,以便拥有父进程可写私有内存的自己的副本(实际上是写时复制),即堆栈和堆。当您从子进程调用 exec() 时,不再需要该内存,并且您的子进程可以拥有自己的较小的私有工作集。
因此,首先要确保的是,一次没有多个进程处于 fork() 和 exec() 之间的状态。在此状态下,子进程必须拥有其父进程虚拟内存空间的副本。
其次,尝试使用过量使用设置,这将允许内核保留比实际存在更多的内存。这些是 /proc/sys/vm/overcommit*。您可以避免使用过度使用,因为您的子进程只需要额外的虚拟机空间,直到它们调用 exec,并且实际上不应该触及父进程的重复地址空间。
第三,在父进程中,您可以使用共享内存分配最大的块,而不是私有的堆栈或堆。因此,当您分叉时,这些共享内存区域将与子进程共享,而不是重复的写时复制。
What could be happening is that the fork call in process A requires a significant amount of RAM + swap (if any). Thus, when you call fork() from this process the kernel must reserve enough RAM and swap for the child process to have it's own copy (copy-on-write, actually) of the parent process's writable private memory, namely it's stack and heap. When you call exec() from the child process, that memory is no longer needed and your child process can have it's own, smaller private working set.
So, first thing to make sure is that you don't have more than one process at a time in the state between fork() and exec(). During this state is where the child process must have a duplicate of it's parent process virtual memory space.
Second, try using the overcommit settings which will allow the kernel to reserve more memory than actually exists. These are /proc/sys/vm/overcommit*. You can get away with using overcommit because your child processes only need the extra VM space until they call exec, and shouldn't actually touch the duplicated address space of the parent process.
Third, in your parent process you can allocate the largest blocks using shared memory, rather than the stack or heap, which are private. Thus, when you fork, those shared memory regions will be shared with the child process rather than duplicated copy-on-write.