如何构建精准的翻译引擎?
几个月前,我自己发现了一个公式,可以将任何源语言(计算机字符)翻译成目标语言(计算机字符)。使用 Lua(桌面用户)和 C++ 类(用于本机访问),以便我可以将其嵌入到 Web 浏览器等中。我想知道我们是否已经在 C++ 或 Lua 中为此提供了更好的东西。
我的有时候它确实不能正确翻译语法甚至规则,在构建它之前我认为我的将是完成的最佳方法,但它现在需要很长时间,我担心它可能会成为错误的实现。现在我想看看其他人并比较我的。
我使用谷歌翻译或其他不是我的目标,我正在构建一个翻译引擎(如谷歌或其他),有人可以在那里放置字典并创建规则。
是否有任何现有的翻译框架或库(OpenCOG 或 Moses)可以将源语言转换为目标语言? 例如:阿拉伯语到中文或英语到日语?或者谷歌/其他人还使用什么?
任何建议将不胜感激
提前致谢。
I found a formula few months ago, myself to translate any source language (computer characters) to destination (computer characters). Using Lua (desk top users) and C++ class (for native access) so that i can embed it in Web Browser etc etc. I am wondering if we have already better something for this in C++ or Lua.
Mine sometimes its really not translating grammars correctly or even rules, before building it i thought mine would be a best way to complete, but its taking way to long now, and i am afraid it may become wrong implementation. Now i want to check out others and compare mine.
I used Google translate or others which is not my target, i was building a translator engine (like google or others), where someone can put there dictionary and create rules.
Is there any existing translation framework or libraries (OpenCOG or Moses) to do Source language to Destination ?
example: Arabic to Chinese or English to Japanese ? Or What else Google/others using ?
Any suggestion would be appreciated
Thanks in advance.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
我不想劝阻你,但你正试图单枪匹马地解决机器翻译的问题。像 Systran 这样的机器翻译系统已经由科学家和工程师团队开发了数十年,但仍然远非完美。
I hate to discourage you, but you are trying to single-handedly solve the problem of Machine Translation. MT systems like Systran have been developed by teams of scientists and engineers for decades and they are still far from perfect.
Moses 是一个非常好的 C++ 开源翻译库。 cdec 代表了当前的技术水平(但需要源语言和目标语言都具有上下文无关语法)。两者都需要大量的训练数据,即并行语料库。
当你完成后,跑到你的大学并要求获得博士学位。
Moses is a pretty good open source translation library for C++. cdec represents the current state of the art (but requires context-free grammars for both source and target language). Both require large amounts of training data, i.e. parallel corpora.
When you've finished, run to your university and demand a PhD.
您查看过 Google 翻译工具包 API 吗?通过分析它的各个方面,您可以了解它实现的内容以及开发自己的翻译框架可能需要的内容(顺便说一句,需要做很多工作)。
创建/上传翻译文档
支持的源语言和目标语言的完整列表
http://www.leniel.net/2010/12/playing-google- translator-toolkit-api.html
更多堆栈信息:
免费/开放-源机器翻译系统和工具
GNU gettext
TinyTM - 开源翻译记忆库
Did you take a look at Google Translator Toolkit API? By analyzing its aspects you can have a glimpse of what it implements and what you may need to develop your own translation framework (a lot of work by the way).
Creating/Uploading translation documents
Full list of supported source and target languages
http://www.leniel.net/2010/12/playing-google-translator-toolkit-api.html
More to the stack:
Free/open-source machine translation systems and tools
GNU gettext
TinyTM - Open-Source Translation Memory