强化学习和POMDP

发布于 2024-08-31 04:04:24 字数 405 浏览 7 评论 0原文

  • 我正在尝试使用多层神经网络在部分可观察马尔可夫过程中实现概率函数。
  • 我认为神经网络的输入将是:当前状态、选择的动作、结果状态; 输出是 [0,1] 中的概率(对当前状态执行选定操作将导致结果状态的概率)
  • 在训练中,我将前面所述的输入输入到神经网络中,并教它输出 = 1.0每个已经发生的案例。

问题:
对于几乎所有测试用例,输出概率都接近 0.95。没有输出低于 0.9! 即使对于几乎不可能的结果,它也给出了很高的概率。

PS:我认为这是因为我只教了发生的情况,而不是未发生的情况。 但我不能在剧集中的每一步都教它每个未发生的动作的输出=0.0!

有什么建议如何解决这个问题吗?或者可能是使用 NN 或实现 prob 函数的另一种方法?

谢谢

  • I am trying to use Multi-Layer NN to implement probability function in Partially Observable Markov Process..
  • I thought inputs to the NN would be: current state, selected action, result state;
    The output is a probability in [0,1] (prob. that performing selected action on current state will lead to result state)
  • In training, I fed the inputs stated before, into the NN, and I taught it the output=1.0 for each case that already occurred.

The problem :
For nearly all test case the output probability is near 0.95.. no output was under 0.9 !
Even for nearly impossible results, it gave that high prob.

PS:I think this is because I taught it happened cases only, but not un-happened ones..
But I can not at each step in the episode teach it the output=0.0 for every un-happened action!

Any suggestions how to over come this problem? Or may be another way to use NN or to implement prob function?

Thanks

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

极度宠爱 2024-09-07 04:04:24

问题是所有可能的后续状态的总和必须等于 1。如果您像这样构建网络,则无法保证这一点。我想到了两种可能的选择,其中我假设离散状态。

  1. 进行预测时,针对每个可能的后续状态运行网络。然后,通过除以所有概率的总和来标准化。
  2. 每种可能的后续状态使用一个输出。然后,您可以使用 softmax 层(如在分类中)并将范围从 0 到 1 的值解释为概率,总和为 1。

这两者从数学角度来看其实大致是等价的。

对于连续变量,您必须假设分布(例如多元高斯)并使用该分布的参数(例如均值和协方差标准差)作为输出。

The problem is that the sum over all possible following states has to equal 1. If you construct your network like that, that is not guaranteed. Two possible alternatives come to my mind, where I assume discrete states.

  1. When making a prediction, run the network for each possible following state. Afterwards, normalize by dividing through the sum of all probabilities.
  2. Use one output per possible following state. You can then use a softmax layer (as in classification) and interpret the values which then range from 0 to 1 and sum up to 1 as probabilities.

These two are actually roughly equivalent from a mathematical perspective.

In the case of continuous variables, you will have to assume distributions (e.g. a multivariate Gaussian) and use the parameters of that distribution (e.g. mean and covariance stdev) as outputs.

带刺的爱情 2024-09-07 04:04:24

在拟合神经网络时,您可能希望拟合更广泛的数据,在训练中是否有任何数据您希望拟合到更接近 0 的概率?如果没有,我怀疑你可能会得到不好的结果。作为第一步,我会尝试在训练数据集中选择一些不同的东西。

另外你是如何训练神经网络的?您是否尝试过使用其他方法?激活函数怎么样,也许尝试使用一些不同的函数。

对于神经网络,我认为在选择模型时进行一些尝试和错误将会有所帮助。 (抱歉,如果这一切还不够具体。)

When fitting the NN you might want to fit a wider range of data, in training is there any data that you want to get fitted to a closer to 0 probability? If there isn't I suspect that you might get poor results. As a first step I'd try choosing some different things in the training data set.

Also how are you training the NN? Have you tried using other methods? How about activation functions, perhaps experiment with using some different ones.

With neural nets I think some trial and error when choosing the model is going to help out. (Sorry if all this isn't specific enough.)

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文