在 python map_async 中打印子进程的并行进度条

发布于 2025-01-13 22:01:34 字数 1184 浏览 3 评论 0原文

我有一个 pandas 数据框列表。我想在所有核心上的每个数据帧上并行运行一个 python 函数。我的函数如下所示:


from tqdm import tqdm

def f(df):
    for _, row in tqdm(df.iterrows, total=len(df)):
        # Do some processing
    
    return result


list_of_dataframes = [df1, df2, df3, df3]
ncores = 4
pool = Pool(ncores)
results = pool.map_async(f, list_of_dataframes)
pool.close()
pool.join()

但是,我没有看到四个进度条在每个智利进程的输出中并行更新。我看到只有一个栏在更新,而且也在来回移动。例如,它移动到 5%,然后再次移动回 2%。我相信这是由于所有进程都更新相同的栏所致。

我尝试保留全局进度条并在每个函数调用中更新它,如下所示,但这不起作用。

from tqdm import tqdm 
from multiprocessing import Pool

list_of_dataframes = [df1, df2, df3, df4]
total_rows = len(df1) + len(df2) + len(df3) + len(df4)

def f(df):
    for _, row in df.iterrows():
        # Some processing
        pbar.update(1)
    return 1 

with tqdm(total=total_rows) as pbar: 
    list_of_dataframes = [df1, df2, df3, df3]
    ncores = 4
    pool = Pool(ncores)
    results = pool.map_async(f, list_of_dataframes)
    pool.close()
    pool.join()

但这也行不通。进度条的行为类似。有没有什么方法可以在上面的代码中锁定 pbar 变量,以便只有一个进程可以一次更新进度条,或者有什么方法可以并行显示 4 个进度条?

I have a list of pandas dataframe. I want to run a python function on each dataframe on all of my cores in parallel. My function looks like this:


from tqdm import tqdm

def f(df):
    for _, row in tqdm(df.iterrows, total=len(df)):
        # Do some processing
    
    return result


list_of_dataframes = [df1, df2, df3, df3]
ncores = 4
pool = Pool(ncores)
results = pool.map_async(f, list_of_dataframes)
pool.close()
pool.join()

However, I'm not seeing four progress bars updating parallel in the output from each chile process. I see only one bar getting updated and that too moves back and forth. For example, it moves till 5% and then again moves back to 2%. I believe that this is due to all processes updating the same bar.

I tried keeping a global progress bar and updating it inside each function call like this, but this isn't working.

from tqdm import tqdm 
from multiprocessing import Pool

list_of_dataframes = [df1, df2, df3, df4]
total_rows = len(df1) + len(df2) + len(df3) + len(df4)

def f(df):
    for _, row in df.iterrows():
        # Some processing
        pbar.update(1)
    return 1 

with tqdm(total=total_rows) as pbar: 
    list_of_dataframes = [df1, df2, df3, df3]
    ncores = 4
    pool = Pool(ncores)
    results = pool.map_async(f, list_of_dataframes)
    pool.close()
    pool.join()

But this is also not working. The progress bar is behaving similarly. Is there any way to put lock on pbar variable in the code above so that only one process can update the progress bar at once, or any way to show 4 progress bars in parallel?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

难以启齿的温柔 2025-01-20 22:01:34

您可以使用小型 parallelbar 库轻松完成您需要的操作。

例如

import pandas as pd
import numpy as np

from parallelbar import progres_map


def foo(df):
    for row in df.iterrows():
        #do something
        pass
    return 'done'

if __name__=='__main__':
    df_1 = pd.DataFrame(np.random.random((1000, 5)))
    df_2 = pd.DataFrame(np.random.random((1000, 5)))
    df_3 = pd.DataFrame(np.random.random((1000, 5)))
    df_4 = pd.DataFrame(np.random.random((1000, 5)))
    result = progress_map(foo, [df_1, df_2, df_3, df4])

 https://raw.githubusercontent.com/dubovikmaster/parallelbar/main/gifs/first_bar_.gif

您还可以找到更多详细信息此处

You can easily do what you need with a small parallelbar library.

for example

import pandas as pd
import numpy as np

from parallelbar import progres_map


def foo(df):
    for row in df.iterrows():
        #do something
        pass
    return 'done'

if __name__=='__main__':
    df_1 = pd.DataFrame(np.random.random((1000, 5)))
    df_2 = pd.DataFrame(np.random.random((1000, 5)))
    df_3 = pd.DataFrame(np.random.random((1000, 5)))
    df_4 = pd.DataFrame(np.random.random((1000, 5)))
    result = progress_map(foo, [df_1, df_2, df_3, df4])

https://raw.githubusercontent.com/dubovikmaster/parallelbar/main/gifs/first_bar_.gif

You can also find more details here

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文