Python并行处理快速填满内存

发布于 2025-01-14 13:42:28 字数 911 浏览 2 评论 0原文

我正在对 ELF 可执行文件（用于 CTF）上的 8 位引脚进行暴力破解，并且正在使用异步并行处理。代码速度非常快，但填满内存的速度更快。大约需要总迭代的 10% 才能填满 8GB 的内存，我不知道是什么原因造成的。有什么帮助吗？

from pwn import *
import multiprocessing as mp
from tqdm import tqdm

def check_pin(pin):
    program = process('elf_exe')
    program.recvn(36)

    program.sendline(str(pin))

    program.recvline()
    program.recvline()
    res = program.recvline()
    
    program.close()

    if 'Access denied.' in str(res):
        return null, null
    else:
        return res, pin

def process_result(res, pin):
    if(res != null):
        print(pin)

if __name__ == '__main__':
    print(f'Starting bruteforce on {mp.cpu_count()} cores :)\n')
    pool = mp.Pool(mp.cpu_count())  

    min = 10000000
    max = 99999999
    for pin in tqdm(range(min, max)):
        pool.apply_async(check_pin, args=(pin), callback=process_result)

    pool.close()
    pool.join()

原文

I'm bruteforcing a 8-digit pin on a ELF executable (it's for a CTF) and I'm using asynchronous parallel processing. The code is very fast but it fills the memory even faster.
It takes about 10% of the total iterations to fill 8gbs of ram, and I have no idea what's causing it. Any help?

from pwn import *
import multiprocessing as mp
from tqdm import tqdm

def check_pin(pin):
    program = process('elf_exe')
    program.recvn(36)

    program.sendline(str(pin))

    program.recvline()
    program.recvline()
    res = program.recvline()
    
    program.close()

    if 'Access denied.' in str(res):
        return null, null
    else:
        return res, pin

def process_result(res, pin):
    if(res != null):
        print(pin)

if __name__ == '__main__':
    print(f'Starting bruteforce on {mp.cpu_count()} cores :)\n')
    pool = mp.Pool(mp.cpu_count())  

    min = 10000000
    max = 99999999
    for pin in tqdm(range(min, max)):
        pool.apply_async(check_pin, args=(pin), callback=process_result)

    pool.close()
    pool.join()

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

肥爪爪 2025-01-21 13:42:28

多处理池创建多个进程。调用 apply_async 创建一个添加到共享数据结构（例如队列）的任务。借助进程间通信 (IPC)，进程可以读取数据结构。问题是 apply_async 返回一个您不使用的同步对象，因此没有同步。数据结构中附加的项目需要一些内存空间（由于分配了 3 个 CPython 对象，因此至少 32*3=96 字节），并且数据结构在内存中增长以容纳 89_999_999 个项目，因此至少需要 8 GiB 的 RAM。该过程不够快，无法执行工作。 tqdm print 完全是一种误导：它只是打印提交的任务数量的处理，而不是执行的一小部分。当tqdm打印100%并且提交循环完成时，几乎所有工作都完成了。我实际上怀疑“代码非常快”，因为它似乎运行 9000 万个进程，而运行一个进程被认为是一项昂贵的操作。

为了加速此代码并避免大量内存使用，您需要将工作聚合为更大的任务。例如，您可以计算一系列 pin 变量，并在 check_pin 中添加循环。合理的范围大小例如为 1000。此外，您需要将 apply_async 返回的 AsyncResult 对象累积在列表中，并在以下情况下执行定期同步：该列表变得太大，因此进程没有太多工作，因此共享数据结构可以保持较小。这是一个未经测试的简单示例：

lst = []
for rng in allRanges:
    lst.append(pool.apply_async(check_pin, args=(rng), callback=process_result))
    if len(lst) > 100:
        # Naive synchronization
        for i in lst:
            i.wait()
        lst = []

Multiprocessing pools create several processes. Calls to apply_async create a task that is added to a shared data structure (eg. queue). The data structure is read by processes thanks to inter-process communication (IPC). The thing is apply_async return a synchronization object that you do not use and so there is not synchronizations. Items appended in the data structure take some memory space (at least 32*3=96 bytes due to 3 CPython objects being allocated) and the data structure grow in memory to hold the 89_999_999 items hence at least 8 GiB of RAM. The process are not fast enough to execute the work. What tqdm print is totally is completely misleading: it just print the processing of the number of task submitted, not the one executed that is only a tiny fraction. Almost all the work is done when tqdm print 100% and the submission loop is done. I actually doubt the "code is very fast" since it appears to run 90 millions process while running a process is known to be an expensive operation.

To speed up this code and avoid a big memory usage, you need to aggregate the work in bigger tasks. You can for example and a range of pin variable to be computed and add a loop in check_pin. A reasonable range size is for example 1000. Additionally, you need to accumulate the AsyncResult objects returned by apply_async in a list and perform periodic synchronizations when the list becomes too big so that processes does not have too much work and so the shared data structure can remain small. Here is a simple untested example:

lst = []
for rng in allRanges:
    lst.append(pool.apply_async(check_pin, args=(rng), callback=process_result))
    if len(lst) > 100:
        # Naive synchronization
        for i in lst:
            i.wait()
        lst = []

回复收藏 0 原文

~没有更多了~