统计机器翻译有什么好的解释吗?
我正在尝试找到关于统计机器翻译如何工作的良好高级解释。也就是说,假设我有一个不对齐的英语、法语和德语文本语料库,我如何使用它来将任何句子从一种语言翻译成另一种语言?我并不是想自己构建一个谷歌翻译,但我想更详细地了解它是如何工作的。
我在谷歌上搜索过,但没有找到什么好的东西,它要么需要高级数学知识才能理解,要么太笼统。维基百科关于SMT的文章似乎两者都有,所以它并没有多大帮助。我怀疑这是一个如此复杂的领域,如果没有所有的数学知识根本不可能理解。
任何人都可以给出或知道这样一个系统如何工作的一般逐步解释,针对程序员(所以代码示例很好),但不需要数学学位来理解?或者像这样的书也很棒。
编辑:我正在寻找的一个完美示例是相当于 的 SMT Peter Norvig 关于拼写纠正的精彩文章。这很好地了解了编写拼写检查器所涉及的内容,而无需深入了解 Levenshtein/soundex/平滑算法等的详细数学......
I'm trying to find a good high level explanation of how statistical machine translation works. That is, supposing I have a corpus of non-aligned English, French and German texts, how could I use that to translate any sentence from one language to another ? It's not that I'm looking to build a Google Translate myself, but I'd like to understand how it works in more detail.
I've seen searched Google but come across nothing good, it either quickly needs advanced mathematics knowledge to understand or is way too generalized. Wikipedia's article on SMT seems to be both, so it doesn't really help much. I'm skeptical that this is such a complex area that it's simply not possible to understand without all the mathematics.
Can anyone give, or know of, a general step-by-step explanation of how such a system works, targeted towards programmers (so code examples are fine) but without needing a mathematics degree to understand ? Or a book that's like this would be great too.
Edit: A perfect example of what I'm looking for would be an SMT equivalent to Peter Norvig's great article on spelling correction. That gives a good idea of what it's involved in writing a spell checker, without going into detailed maths on Levenshtein/soundex/smoothing algorithms etc...
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
这是一个很好的视频讲座(分为两部分):
http://videolectures.net/aerfaiss08_koehn_pbfs/
对于深入的细节,我强烈推荐这本书:
http://www.amazon.com/Statistical-Machine-Translation-Philipp-Koehn/dp/ 0521874157
两者都来自于创建了研究中使用最广泛的机器翻译系统的人。它涵盖了所有基本内容,解释得很好且准确。这可能是任何开始进入该领域的研究人员都应该阅读的事实上的标准书籍之一。
Here is a nice video lecture (in 2 parts):
http://videolectures.net/aerfaiss08_koehn_pbfs/
For in-depth details, I highly advise this book:
http://www.amazon.com/Statistical-Machine-Translation-Philipp-Koehn/dp/0521874157
Both are from the guy who created the most widely used MT system in research. It covers all the fundamental stuff, is very well explained and accurate. This probably one of the de-facto standard books that any researcher beginning in this field should read.
早在 1998 年 12 月,《大西洋在线》就对统计机器翻译进行了非常简单的非技术性描述:
我以前读过有关统计机器翻译的非技术性内容,但总是想知道“是的,但是当词序变化且假定不使用字典和语法时,统计内容如何知道哪些单词映射到哪些单词?”这篇文章实际上确实回答了这个问题,而且简单明了,我感到非常惊讶。
The Atlantic Online had a very straightforward nontechnical description of statistical machine translation back in December 1998:
I've read nontechnical stuff on statistical MT before but always wondered "yeah but how does the statistical stuff know which words map to which when word orders vary and supposedly no dictionary and no grammar are used?" Well this article actually does answer that and it's simple and straightforward and I was quite surprised.
Peter Norvig 在 2007 年 Google 开发者日上的演讲,从数据中推理:避免资本错误,包含一些统计机器翻译原理的易于理解的高级解释(从大约 21:20 开始)。
A Peter Norvig talk from Google Developer Day 2007, Theorizing from Data: Avoiding the Capital Mistake, contains some accessible high-level explanation of the principles of statstical machine translation (starting from about 21:20).