我以稳定基线3的多处理示例,一切都很好。
但是,当我尝试使用PPO而不是A3C,而BipedalWalker-V3而不是Cartpole-V1时,我会看到多处理模式下的性能较差。我的问题是:我在做什么错?为什么要慢?
我的代码是:
import gym
import time
from stable_baselines3 import PPO
from stable_baselines3 import A2C
from stable_baselines3.common.env_util import make_vec_env
from stable_baselines3.common.evaluation import evaluate_policy
env_name = "BipedalWalker-v3"
num_cpu = 4
n_timesteps = 10000
env = make_vec_env(env_name, n_envs=num_cpu)
model = PPO('MlpPolicy', env, verbose=0)
start_time = time.time()
model.learn(n_timesteps)
total_time_multi = time.time() - start_time
print(f"Took {total_time_multi:.2f}s for multiprocessed version - {n_timesteps / total_time_multi:.2f} FPS")
single_process_model = PPO('MlpPolicy', env_name, verbose=0)
start_time = time.time()
single_process_model.learn(n_timesteps)
total_time_single = time.time() - start_time
print(f"Took {total_time_single:.2f}s for single process version - {n_timesteps / total_time_single:.2f} FPS")
print("Multiprocessed training is {:.2f}x faster!".format(total_time_single / total_time_multi))
输出是:
Took 16.39s for multiprocessed version - 610.18 FPS
Took 14.19s for single process version - 704.80 FPS
Multiprocessed training is 0.87x faster!
I took multiprocessing example for Stable Baselines 3 and everything was fine.
https://colab.research.google.com/github/Stable-Baselines-Team/rl-colab-notebooks/blob/sb3/multiprocessing_rl.ipynb#scrollTo=pUWGZp3i9wyf
Multiprocessed training took approximately 3.6x less time than single processing with num_cpu=4.
But when I'm trying to use PPO instead of A3C, and BipedalWalker-v3 instead of CartPole-v1, I see worse performance in multiprocessing mode. My question is: What am I doing wrong? Why is it slower?
My code is:
import gym
import time
from stable_baselines3 import PPO
from stable_baselines3 import A2C
from stable_baselines3.common.env_util import make_vec_env
from stable_baselines3.common.evaluation import evaluate_policy
env_name = "BipedalWalker-v3"
num_cpu = 4
n_timesteps = 10000
env = make_vec_env(env_name, n_envs=num_cpu)
model = PPO('MlpPolicy', env, verbose=0)
start_time = time.time()
model.learn(n_timesteps)
total_time_multi = time.time() - start_time
print(f"Took {total_time_multi:.2f}s for multiprocessed version - {n_timesteps / total_time_multi:.2f} FPS")
single_process_model = PPO('MlpPolicy', env_name, verbose=0)
start_time = time.time()
single_process_model.learn(n_timesteps)
total_time_single = time.time() - start_time
print(f"Took {total_time_single:.2f}s for single process version - {n_timesteps / total_time_single:.2f} FPS")
print("Multiprocessed training is {:.2f}x faster!".format(total_time_single / total_time_multi))
The output is:
Took 16.39s for multiprocessed version - 610.18 FPS
Took 14.19s for single process version - 704.80 FPS
Multiprocessed training is 0.87x faster!
发布评论
评论(1)
您可以尝试通过类,为
vec_env_cls
make_vec_env 。违约 subprocvecenv 更好(因为它创建了实际的子程序)。
You can try to pass the SubprocVecEnv class as the
vec_env_cls
arguments of make_vec_env.By default make_vec_env uses the DummyVecEnv wrapper to vectorize the environment. This does not actually create subprocesses, but it calls each environment in sequence.
It is good for simple environments (such as CartPole), when the overhead of multiprocessing outweighs the environment computation time, but for more computationally heavy environments, SubprocVecEnv is better (as it creates actual subprocesses).