是否可以在不同的观察和行动空间进行演员评论的转移学习?
我一直在连续控制任务上尝试使用 SAC 和 TD3 等演员批评家网络,并尝试使用经过训练的网络进行迁移学习到另一个具有较小观察和动作空间的任务。
如果我将权重保存在字典中然后将其加载到新环境中,是否可以这样做? Actor-Critic 网络的输入需要具有不同维度的状态以及输出具有不同维度的 Actor。
我有一些通过添加另一个分类器头并对其进行微调来对变压器模型进行微调的经验,但是如果初始层和最终层与学习到的代理不匹配,我将如何使用 Actor-Critic 网络来做到这一点。
I have been experimenting with actor-critic networks such as SAC and TD3 on continuous control tasks and trying to do transfer learning using the trained network to another task with smaller observation and action space.
Would it be possible to do so if i were to save the weights in a dictionary and then load it in the new environment? The inputs to the Actor-Critic network requires a state with different dimensions as well as outputting an actor with different dimensions.
I had some experience doing fine-tuning with transformer models by addind another classifier head and fine-tuning it, but how would i do this with Actor-Critic networks, if the initial layer and final layer does not match with the learned agent.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论