python多处理随着内存的最大化而停止
我正在运行此代码:
# Get big (0.5GB) list of data
all_ftrajs = get_feature_trajs(traj_top_paths, hp_dict)
# Function I want to bootstrap
bs_func = partial(bs_func, all_ftrajs=all_ftrajs)
rng = get_rng(seed)
n_workers = min(n_cores, bs_samples)
results = []
if n_workers > 1:
with Pool(n_workers) as pool:
for i in range(bs_samples):
# bootstrap list indices
_, bs_ix = sample_trajectories(all_ftrajs, rng, bs_samples > 1)
# accumulate results
results.append(pool.apply_async(func=bs_func,
args=(hp_dict, bs_ix, seed,
bs_dir.joinpath(f"{i}.pkl"), hp_idx),
kwds=kwargs))
# Get results
for r in results:
r.get()
# close off pool
pool.close()
pool.join()
此特定的代码引导程序通过分析numpy数组的列表 - all_ftrajs
- 在引导其元素之后,通过分析numpy阵列的列表 - ally_ftrajs
,通过100次分析(bs_samples = 100
)
def bs_func(hp_dict: Dict[str, List[Union[str, int]]],
bs_ix: np.ndarray, seed: Union[int, None],
out_dir: Path, hp_idx: int,
lags: List[int], all_ftrajs: List[np.ndarray]):
# Bootstrap the list of numpy arrays
feat_trajs = [all_ftrajs[i] for i in bs_ix]
# do the analysis
try:
tica, kmeans = discretize_trajectories(hp_dict, feat_trajs, seed)
disc_trajs = kmeans.dtrajs
mods_by_lag = estimate_msms(disc_trajs, lags)
outputs = score_msms(mods_by_lag)
outputs.ix = hp_idx
write_outputs(outputs, out_dir)
except Exception as e:
logging.info(e)
return True
。在一系列不同的实验(试验)上进行自举。经过少量试验后,程序冻结:
- 所有过程仍然活跃。我的机器上有12个逻辑内核,并且使用了2个尺寸的2。
- 交换内存和RAM最大化(分别为2/16GB)。
- 在串行中运行此操作时,所有CPU的使用量为0.0%
- 根据HTOP,单个过程使用不超过10%的内存。
我尝试过:
- 将大数据列表作为参数传递到
bs_func
。我使用partial
构造灵感来自此问题的答案:多处理中的内存对象 - 在串行中运行它(这项工作)
我正在运行Python 3.8,并且我在Ubuntu 20.04.4 LTS,64位。我有15.5GIB的内存,并且我使用的是AMD Ryzen 5 3600 6核处理器X 12(从我的“大约”部分复制了)
问题:
- 我的代码是否有明显的修复来使其正常工作?
- 还有其他解决方案不需要太多的代码重构吗?
谢谢,最良好的祝福,
罗布
I'm running this code:
# Get big (0.5GB) list of data
all_ftrajs = get_feature_trajs(traj_top_paths, hp_dict)
# Function I want to bootstrap
bs_func = partial(bs_func, all_ftrajs=all_ftrajs)
rng = get_rng(seed)
n_workers = min(n_cores, bs_samples)
results = []
if n_workers > 1:
with Pool(n_workers) as pool:
for i in range(bs_samples):
# bootstrap list indices
_, bs_ix = sample_trajectories(all_ftrajs, rng, bs_samples > 1)
# accumulate results
results.append(pool.apply_async(func=bs_func,
args=(hp_dict, bs_ix, seed,
bs_dir.joinpath(f"{i}.pkl"), hp_idx),
kwds=kwargs))
# Get results
for r in results:
r.get()
# close off pool
pool.close()
pool.join()
This particular code bootstraps some analysis 100 times (bs_samples = 100
) by analysing a list of numpy arrays - all_ftrajs
- after bootstrapping its elements:
def bs_func(hp_dict: Dict[str, List[Union[str, int]]],
bs_ix: np.ndarray, seed: Union[int, None],
out_dir: Path, hp_idx: int,
lags: List[int], all_ftrajs: List[np.ndarray]):
# Bootstrap the list of numpy arrays
feat_trajs = [all_ftrajs[i] for i in bs_ix]
# do the analysis
try:
tica, kmeans = discretize_trajectories(hp_dict, feat_trajs, seed)
disc_trajs = kmeans.dtrajs
mods_by_lag = estimate_msms(disc_trajs, lags)
outputs = score_msms(mods_by_lag)
outputs.ix = hp_idx
write_outputs(outputs, out_dir)
except Exception as e:
logging.info(e)
return True
I do this bootstrapping on a series of different experiments (trials). After a handful of trials the program freezes:
- all processes are still active. I have 12 logical cores on my machine and I'm using a pool of size 2.
- swap memory and RAM is maxed out (2/16GB respectively).
- all cpu usage is 0.0%
when running this in serial - the single process uses no more than 10% memory according to htop.
I have tried:
- passing the big data list as a parameter to
bs_func
. I use thepartial
construction inspired by the answer to this question: Shared-memory objects in multiprocessing - running it in serial (this works)
I'm running python 3.8 and I'm on ubuntu 20.04.4 LTS, 64bit. I've got 15.5GiB memory, and I'm using AMD Ryzen 5 3600 6-core processor x 12 (this is copied from my 'about' section)
Questions:
- Is there an obvious fix to my code to make it work?
- Are there any other solutions that don't require too much refactoring of code?
Thanks and best wishes,
Rob
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论