多处理:写入HDF5文件
我正在Python中运行并行的代码,并且正在尝试在每次迭代中保存一些值。我的代码可以简化/总结如下:
# Import necessary libraries
def func(a,b):
# Generate some data and save it into "vector".
# Create a Hdf5 file and save data in vector.
with h5py.File('/some_file.hdf5', 'w') as f:
f.create_dataset('data_set', data=vector)
# Some code
# Parallelize func
if __name__ == '__main__':
with mp.Pool(2) as p:
[p.apply_async(func, args=(elem, b)) for elem in big_array]
我将在同时保存文件以节省内存的同时保存文件,因为我将处理大量数据。
但是,每次运行脚本时,都不会生成HDF5文件,并且没有保存数据。
与Python的并行化相当新,我不明白问题是什么。
I am running a parallelized code in Python and I am trying to save some values within each iteration. My code could be simplified/summarized as follows:
# Import necessary libraries
def func(a,b):
# Generate some data and save it into "vector".
# Create a Hdf5 file and save data in vector.
with h5py.File('/some_file.hdf5', 'w') as f:
f.create_dataset('data_set', data=vector)
# Some code
# Parallelize func
if __name__ == '__main__':
with mp.Pool(2) as p:
[p.apply_async(func, args=(elem, b)) for elem in big_array]
I am saving the files while parallelizing to save memory, since I will be working with big amounts of data.
However, every time I run the script, no hdf5 file is generated and the data is not saved.
I am pretty new to Parallelization with Python and I do not understand what the problem is.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
最后,我用命令(最后两行)更改了
:
它有效!
似乎具有命令的
的代码似乎基本上将
循环的在分配给每个处理器并在完成所有计算之前将循环留下。In the end I changed the
with
command (last two lines) by the following:and it worked!
It seems the previous code, with the
with
command, basically leaves thefor
loop when the tasks are assigned to each processor and leaves the loop before all calculations are done.