HMM用于文档图像中的透视估计,无法理解算法
这是一篇论文,它是关于估计包含文本和一些噪声或非文本对象的二值图像的视角。
算法使用隐马尔可夫模型:实际上有两个条件 T-文本 B - 背景(即噪声)
算法本身很难理解。问题是 我读过有关隐马尔可夫模型的内容,我知道它使用必须已知的概率。 但在这个算法中我无法理解,如果他们使用 HMM,他们如何获得这些概率(将状态从 S1 更改为另一个状态(例如 S2)的概率)?
我在那篇论文中也没有找到任何有关培训的内容。 所以,如果有人理解的话,请告诉我。 在不知道状态变化概率的情况下是否可以使用 HMM?
编辑: 可能他们正在使用一些估计,而不知道 HMM 参数(概率)
Here is a paper, it is about estimating the perspective of binary image containing text and some noise or non text objects.
The algorithm uses the Hidden Markov Model: actually two conditions
T - text
B - backgrouond (i.e. noise)
It is hard to understand the algorithm itself. The question is that
I've read about Hidden Markov Models and I know that it uses probabilities that must be known.
But in this algorithm I can't understand, if they use HMM, how do they get those probabilities (probability of changing the state from S1 to another state for example S2)?
I didn't find anything about training there also in that paper.
So, if somebody understands it, please tell me.
Also is it possible to use HMM without knowing the state change probabilities?
EDIT:
May be they are using some estimation, without knowing the HMM parameters (probabilities)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
也许这有点太学术化了,更多地与应用数学相关而不是与编程相关?
不管怎样:传统上,HMM 是通过使用一些已经分类数据的数据库来训练的(即学习模型参数的值,在本例中是概率)。请参阅Baum Welch 算法。这种分为两个阶段的划分:首先学习(或训练)(使用分类标记数据),然后分类(或实际工作)(使用未分类数据)是许多算法的典型特征,这称为监督分类 。
另一方面,有时我们没有“已知”(分类)数据,因此我们必须诉诸无监督分类,即我们尝试同时学习模型和分类。这是更加有限的,并且通常意味着对模型进行许多简化和参数减少(这样我们就没有那么多东西要学习)。
乍一看,这似乎是那篇文章中所采取的路径:他们不考虑完全通用的 HMM,而是考虑非常有限的 HMM,并尝试找到适合该模型的方法。但是,我又没有认真读过它。
Perhaps this is a little too academic, more related to applied mathematics than to programming ?
Anyway: HMM are traditionally trained (i.e. learn the value of the parameters of the model, in this case the probabilities) by using some database of already classified data. See the Baum Welch algorithm. This division of two phases: learning (or training) first (with classified-labelled data), classify (or real work) after (with unclassified data) is typical of many algorithms, and it's called supervised classification.
On the other hand, sometimes we don't have 'known' (classified) data, so we must resort to unsupervised classification, in which we try to learn the model and classify at the same time. This is much more limited, and usually implies making many symplifications and reduction of parameters for the model (so that we don't have so many things to learn).
At first glance, this seems to be the path taken in that article: they dont consider a fully general HMM, but a very restricted one, and try to find a good fit for the model. But, again, I havent read it seriously.