学习分层强化任务的结构
我一直在研究分层强化学习问题,虽然很多论文提出了学习策略的有趣方法,但它们似乎都假设他们事先知道描述域中动作的图结构。例如,MAXQ 方法Dietterich 的分层强化学习描述了一个简单 Taxi 域的操作和子任务的复杂图,但没有描述该图是如何发现的。您将如何了解该图的层次结构,而不仅仅是策略?
I've been studying hierachial reinforcement learning problems, and while a lot of papers propose interesting ways for learning a policy, they all seem to assume they know in advance a graph structure describing the actions in the domain. For example, The MAXQ Method for Hierarchial Reinforcement Learning by Dietterich describes a complex graph of actions and sub-tasks for a simple Taxi domain, but not how this graph was discovered. How would you learn the hierarchy of this graph, and not just the policy?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
在 Dietterich 的 MAXQ 中,图形是手动构建的。这被认为是系统设计者的一项任务,就像提出表示空间和奖励函数一样。
根据您想要实现的目标,您可能希望自动分解状态空间、学习相关功能或将经验从简单任务转移到更复杂的任务。
我建议您开始阅读引用您链接到的 MAXQ 论文。在不知道你到底想要实现什么的情况下,我不能给出非常具体的规定(而且我并没有真正掌握当前所有的强化学习研究),但你可能会在 Luo、Bell 和 Laurent 的工作中找到相关的想法。麦科勒姆或 Madden & 的论文豪利。
In Dietterich's MAXQ, the graph is constructed manually. It's considered to be a task for the system designer, in the same way that coming up with a representation space and reward functions are.
Depending on what you're trying to achieve, you might want to automatically decompose the state space, learn relevant features, or transfer experience from simple tasks to more complex ones.
I'd suggest you just start reading papers that refer to the MAXQ one you linked to. Without knowing what exactly what you want to achieve, I can't be very prescriptive (and I'm not really on top of all the current RL research), but you might find relevant ideas in the work of Luo, Bell & McCollum or the papers by Madden & Howley.
本文描述的一种方法是一个很好的起点:
N. Mehta、S. Ray、P. Tadepalli 和 T. Dietterich。 MAXQ层次结构的自动发现和传输。国际机器学习会议,2008 年。
http://web.engr .oregonstate.edu/~mehtane/papers/hi-mat.pdf
This paper describes one approach that is a good starting point:
N. Mehta, S. Ray, P. Tadepalli, and T. Dietterich. Automatic Discovery and Transfer of MAXQ Hierarchies. In International Conference on Machine Learning, 2008.
http://web.engr.oregonstate.edu/~mehtane/papers/hi-mat.pdf
假设有一个特工在那里四处走动做事。您不知道其内部目标(任务图)。您如何推断其目标?
从某种程度上来说,这是不可能的。就像我不可能知道当你放下那个盒子时你有什么目的一样:也许你累了,也许你看到了杀人蜂,也许你想撒尿......
你正在尝试模拟特工的内部目标结构。为了做到这一点,您需要某种指导来了解可能的目标集是什么以及这些目标如何通过行动来表示。在研究文献中,这个问题已经在“计划识别”术语下进行了研究,并且还使用了 POMDP(部分可观察马尔可夫决策过程),但这两种技术都假设您确实了解其他智能体的目标。
如果你对它的目标一无所知,你所能做的就是推断上述模型之一(这就是我们人类所做的。我假设其他人也有与我相同的目标。我从来不会想,“哦,他放弃了他的笔记本电脑,他必须准备好下蛋”cse,他是一个人。)或者将其建模为一个黑匣子:一个简单的状态到动作函数,然后根据需要添加内部状态(嗯,一定有人写了一篇论文对此,但我不知道是谁)。
Say there is this agent out there moving about doing things. You don't know its internal goals (task graph). How do you infer its goals?
In way way, this is impossible. Just as it is impossible for me to know what goal you had mind when you put that box down: maybe you were tired, maybe you saw a killer bee, maybe you had to pee....
You are trying to model an agent's internal goal structure. In order to do that you need some sort of guidance as to what are the set of possible goals and how these are represented by actions. In the research literature this problem has been studied under the terms "plan recognition" and also with the use of POMDP (partially observable markov decision process), but both of these techniques assume you do know something about the other agent's goals.
If you don't know anything about its goals, all you can do is either infer one of the above models (This is what we humans do. I assume others have the same goals I do. I never think, "Oh, he dropped his laptop, he must be ready to lay an egg" cse, he's a human.) or model it as a black box: a simple state-to-actions function then add internal states as needed (hmmmm, someone must have written a paper on this, but I don't know who).