We don’t allow questions seeking recommendations for software libraries, tutorials, tools, books, or other off-site resources. You can edit the question so it can be answered with facts and citations.
Closed 3 years ago.
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
接受
或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
发布评论
评论(8)
在将语言翻译与 XMPP 聊天服务器集成时,我必须解决同样的问题。我将有效负载(我需要翻译的文本)划分为完整句子的较小子集。
我不记得确切的数字,但通过 Google 的基于 REST 的翻译网址,我进行了翻译一组完整的句子,总共少于(或等于)1024 个字符,因此大段落将导致多次翻译服务调用。
I had to solve the same problem when integrating language translation with an XMPP chat server. I partitioned my payload (the text I needed to translate) into smaller subsets of complete sentences.
I can’t recall the exact number, but with Google's REST-based translation URL, I translated a set of completed sentences that collectively had a total of less than (or equal to) 1024 characters, so a large paragraph would result in multiple translation service calls.
将大文本分解为标记化字符串,然后通过循环将每个标记传递给翻译器。将翻译后的输出存储在一个数组中,一旦所有标记都被翻译并存储在数组中,将它们放回一起,您将拥有一个完全翻译的文档。
只是为了证明一点,我把它放在一起:)它的边缘很粗糙,但它可以处理整个大量文本,并且它在翻译准确性方面与 Google 一样好,因为它使用谷歌API。我使用此代码并单击一个按钮处理了 Apple 的整个 2005 年 SEC 10-K 文件(花费了大约 45 分钟)。
结果与一次复制一个句子并将其粘贴到谷歌翻译中所得到的结果基本相同。它并不完美(结尾标点符号不准确,而且我没有逐行写入文本文件),但它确实显示了概念证明。如果您更多地使用正则表达式,它可能会有更好的标点符号。
Break your big text into tokenized strings, and then pass each token through the translator via a loop. Store the translated output in an array and once all tokens are translated and stored in the array, put them back together and you will have a completely translated document.
Just to prove a point, I threw this together :) It is rough around the edges, but it will handle a whole lot of text and it does just as good as Google for translation accuracy because it uses the Google API. I processed Apple's entire 2005 SEC 10-K filing with this code and the click of one button (took about 45 minutes).
The result was basically identical to what you would get if you copied and pasted one sentence at a time into Google Translate. It isn't perfect (ending punctuation is not accurate and I didn't write to the text file line by line), but it does show a proof of concept. It could have better punctuation if you worked with Regex some more.
使用 MyGengo。他们有一个免费的机器翻译 API——我不知道质量如何,但你也可以付费插入人工翻译。
我与他们没有任何关系,也没有使用过它们,但我听说过一些好消息。
Use MyGengo. They have a free API for machine translation - I don't know what the quality is like, but you can also plug in human translation for a fee.
I'm not affiliated with them nor have I used them, but I've heard good things.
我们使用了http://www.berlitz.co.uk/translation/。
我们会向他们发送一个包含英语内容的数据库文件以及我们所需的语言列表,然后他们会使用各种双语人员来提供翻译。他们还使用配音演员为我们的电话界面提供 WAV 文件。
这显然不如自动翻译快,而且不是免费的,但我认为这种服务是确保您的翻译有意义的唯一方法。
We used http://www.berlitz.co.uk/translation/.
We'd send them a database file with the English content, and a list of the languages we required, and they'd use various bilingual people to provide the translations. They also used voice-actors to provide WAV files for our telephone interface.
This was obviously not as fast as automated translation, and not free, but I think this sort of service is the only way to be sure your translation makes sense.
Google 提供了一个有用的工具,Google 翻译工具包,它可以让您上传文件并立即将其翻译为 Google 翻译支持的任何语言。
如果您想使用自动翻译,它是免费的,但也可以选择聘请真人为您翻译您的文档。
来自维基百科:
链接
Google provides a useful tool, Google Translator Toolkit, which allows you to upload files and translate them, to whichever language Google Translate supports, at once.
It's free if you want to use the automated translations but there is an option to hire real persons to translate your documents for you.
From Wikipedia:
Link
有许多不同的机器翻译 API:Google、Microsoft、Yandex、IBM、PROMT、 Systran、百度、YeeCloud、DeepL、SDL 和 SAP。
其中一些支持批量请求(一次翻译一组文本)。我会逐句翻译,并适当处理 403/429 错误(通常用于响应超出配额)。
我可以向您推荐我们最近的评估研究(2017 年 11 月):机器翻译状态
There are a plenty of different machine translation APIs: Google, Microsoft, Yandex, IBM, PROMT, Systran, Baidu, YeeCloud, DeepL, SDL, and SAP.
Some of them support batch requests (translating an array of text at once). I would translate sentence by sentence with proper processing of 403/429 errors (usually used to respond to exceeded quota).
I may refer you to our recent evaluation study (November 2017): State of machine translation
免责声明:虽然我确实发现标记化作为一种翻译手段值得怀疑,但稍后将句子分割由 ubiquibacon 说明 可能会产生满足您要求的结果。
我建议可以通过将 30 多行字符串修改减少到他要求的一行正则表达式来改进他的代码 在另一个问题中,但该建议没有得到很好的接受。
以下是在 VB 中使用 Google API for .NET 的实现.NET 和 C#。
文件 Program.cs
文件 Module1.vb
输入(直接从 ubiquibacon 窃取)
结果(德语打字):
Disclaimer: While I definitely find tokenizing as a means of translation suspect, splitting on sentences as later illustrated by ubiquibacon may produce results that fill your requirements.
I suggested that his code could be improved by reducing the 30+ lines of string munging to the one-line regex he asked for in another question, but the suggestion was not well received.
Here is an implementation using the Google API for .NET in VB.NET and C#.
File Program.cs
File Module1.vb
Input (stolen directly from ubiquibacon)
Results (to German for typoking):
这非常简单,有几种方法:
下面是一个示例(第二个):
方法:
方法调用:
StringtranslatedText = TranslateTextEnglishSpanish("hello world");
结果:
translatedText == "hola mundo";
您只需要获取所有语言的参数并使用它们即可获得您需要的翻译。
您可以使用 Firefox 的实时 Http Headers 插件 获取数千个值。
It's pretty simple, and there are a few ways:
Here is an example (of the second one):
Method:
Method Call:
String translatedText = TranslateTextEnglishSpanish("hello world");
Result:
translatedText == "hola mundo";
You just need to get all languages' parameters and use them in order to get translations you need.
You can get thous values using the Live Http Headers addon for Firefox.