多处理：写入HDF5文件

发布于 2025-01-28 11:08:57 字数 602 浏览 3 评论 0原文

我正在Python中运行并行的代码，并且正在尝试在每次迭代中保存一些值。我的代码可以简化/总结如下：

# Import necessary libraries

def func(a,b):
    # Generate some data and save it into "vector".
    
    # Create a Hdf5 file and save data in vector.
    with h5py.File('/some_file.hdf5', 'w') as f:

        f.create_dataset('data_set', data=vector)

# Some code

# Parallelize func
if __name__ == '__main__':
    with mp.Pool(2) as p:
        [p.apply_async(func, args=(elem, b)) for elem in big_array]

我将在同时保存文件以节省内存的同时保存文件，因为我将处理大量数据。

但是，每次运行脚本时，都不会生成HDF5文件，并且没有保存数据。

与Python的并行化相当新，我不明白问题是什么。

原文

I am running a parallelized code in Python and I am trying to save some values within each iteration. My code could be simplified/summarized as follows:

# Import necessary libraries

def func(a,b):
    # Generate some data and save it into "vector".
    
    # Create a Hdf5 file and save data in vector.
    with h5py.File('/some_file.hdf5', 'w') as f:

        f.create_dataset('data_set', data=vector)

# Some code

# Parallelize func
if __name__ == '__main__':
    with mp.Pool(2) as p:
        [p.apply_async(func, args=(elem, b)) for elem in big_array]

I am saving the files while parallelizing to save memory, since I will be working with big amounts of data.

However, every time I run the script, no hdf5 file is generated and the data is not saved.

I am pretty new to Parallelization with Python and I do not understand what the problem is.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

稚然 2025-02-04 11:08:57

最后，我用命令（最后两行）更改了：

p = mp.Pool(2)
result = [p.apply_async(func, args=(elem, b)) for elem in big_array]
p.close()
p.join()

它有效！

似乎具有命令的的代码似乎基本上将循环的在分配给每个处理器并在完成所有计算之前将循环留下。

In the end I changed the with command (last two lines) by the following:

p = mp.Pool(2)
result = [p.apply_async(func, args=(elem, b)) for elem in big_array]
p.close()
p.join()

and it worked!

It seems the previous code, with the with command, basically leaves the for loop when the tasks are assigned to each processor and leaves the loop before all calculations are done.

回复收藏 0 原文

~没有更多了~