如果一个网站使用简体中文翻译进行本地化/国际化......
- 是否可以可靠地
自动将文本转换为
传统中国,高品质
方式?
- 如果是这样,它的质量会非常高,还是只是一个翻译人员调整的良好起点?
- 是否有开源工具(最好是 PHP)可以做
这样的转换?
- 这种转换方式是否比另一种方式更好(简化 -> 传统,或反之亦然)?
If a website is localized/internationalized with a Simplified Chinese translation...
- Is it possible to reliably
automatically convert the text to
Traditional Chinese in a high quality
way?
- If so, is it going to be extremely high quality or just a good starting point for a translator to tweak?
- Are there open source tools (ideally in PHP) to do
such a conversion?
- Is the conversion better one way vs. the other (simplified -> traditional, or vice versa)?
发布评论
评论(6)
简短的回答:不,不可靠+高质量。我不会推荐自动化工具,除非市场对你来说不是那么重要并且你可能会冒某些公开尴尬的失误的风险。您可能会发现一些本地化公司更愿意从高质量的简体中文翻译开始并将其改编为繁体,但您也可能会发现许多公司更喜欢从英文源开始。
更长的答案:在某些情况下,只有字形不同,并且它们具有不同的 unicode 代码点。但中国大陆和台湾/香港之间也存在一些惯用语和词汇差异,如果不处理好这些差异,您的质量就会受到影响。技术术语的问题可能较多或较少,具体取决于这些术语普遍使用的时代。其中一些问题可能会被自动化工具捕获,但不是全部。当然,如果您选择自动转换事物的路线,请确保您得到每个目标市场的 QA 团队的认可。
此外,还存在社会政治问题。例如,您可以在台湾使用“中华民国”之类的术语,但如果它出现在您的简体中文版本(有时是英文版本)中,这将极大地激怒中国政府;如果您在中国有实际的子公司或合作伙伴,其员工可能仅因颠覆性术语而被捕。 (这并非中国独有;巴基斯坦/印度和土耳其也有类似的问题)。如果将“台湾”称为“国家”,你也会遇到类似的麻烦。
Short answer: No, not reliably+high quality. I wouldn't recommend automated tools unless the market isn't that important to you and you can risk certain publicly embarrassing flubs. You may find some localization firms are happier to start with a quality simplified Chinese translation and adapt it to traditional, but you may also find that many companies prefer to start with the English source.
Longer answer: There are some cases where only the glyphs are different, and they have different unicode code points. But there are also some idiomatic and vocabulary differences between the PRC and Taiwan/Hong Kong, and your quality will suffer if these aren't handled. Technical terms may be more problematic or less, depending on the era in which the terms became commonly used. Some of these issues may be caught by automated tools, but not all of them. Certainly, if you go the route of automatically converting things, make sure you get buyoff from QA teams based in each of your target markets.
Additionally, there are sociopolitical concerns as well. For example, you can use terms like "Republic of China" in Taiwan, but this will royally piss off the Chinese government if it appears in your simplified Chinese version (and sometimes your English version); if you have an actual subsidiary or partner in China, the staff may be arrested solely on the basis of subversive terminology. (This is not unique to China; Pakistan/India and Turkey have similar issues). You can get into similar trouble by referring to "Taiwan" as a "country."
作为一个土生土长的香港人,我同意@JasonTrue:不要这样做。您可能会激怒和冒犯台湾和香港的潜在用户。
但是,如果您仍然坚持这样做,请查看维基百科是如何做到的; 此处是一种实现(注意许可证)。
As a native Hong Konger myself, I concur with @JasonTrue: don't do it. You risk angering and offending your potential users in Taiwan and Hong Kong.
BUT, if you still insist on doing so, have a look at how Wikipedia does it; here is one implementation (note license).
其他答案都集中在困难上,但这些都被夸大了。有一件事是,很大一部分角色是完全相同的。第二件事是“简化”形式正是:繁体字的简化形式。这意味着繁体字和简体字之间大多存在一一对应的关系。
有些事情需要调整。
据我所知,尽管您可能想查看谷歌翻译API?
一些字符在简化的字母表中失去了区别。例如,面(面粉)被简化为与面(脸、侧面)相同的字符。因此,传统->简化会稍微更准确。
我还想指出,繁体字不仅仅在台湾使用(它们可以在香港找到,有时甚至在大陆也可以找到)
我能够找到 这个 和 这个。不过需要创建一个帐户才能下载。我自己从未使用过该网站,因此我无法保证。
Other answers are focused on the difficulties, but these are exaggerated. One thing is that a substantial portion of the characters are exactly the same. The second thing is the 'simplified' forms are exactly that: simplified forms of the traditional characters. That means mostly there is a 1 to 1 relationship between traditional and simplified characters.
A few things will need tweaking.
Not that I am aware of, though you might want to check out the google translate api?
A few characters lost distinction in the simplified alphabet. For instance 麵(flour) was simplified to the same character as 面(face, side). For this reason traditional->simplified would be slightly more accurate.
I'd also like to point out that traditional characters are not solely in use in Taiwan (They can be found in HK and occasionally even in the mainland)
I was able to find this and this. Need to create an account to download, though. Never used the site myself so I cannot vouch for it.
从根本上来说,简体中文单词有很多缺失的含义。世界上没有一种编程语言能够准确地将简体中文转换为繁体中文。你只会给你的目标受众(香港、澳门、台湾)带来困惑。
从简体中文到繁体中文的失败翻译的一个完美例子就是“后”字。在简化形式中,它有两个含义,“后面”或“女王”。然而,当您尝试将其转换回繁体中文时,可能有两个以上的字符选择:后“behind”或后“queen”。我遇到的一个有趣的例子是,一位翻译将“皇后大道”Queen's Road 改成了“皇后大道”,字面意思是 Queen's Behind Road。
除非你的翻译算法超级聪明,否则它必然会产生错误。因此,您最好聘请一位精通两种中文的优秀翻译。
Fundamentally, simplified Chinese words have a lot of missing meanings. No programming language in the world will be able to accurately convert simplified Chinese into traditional Chinese. You will just cause confusion for your intended audience (Hong Kong, Macau, Taiwan).
A perfect example of failed translation from simplified Chinese to traditional Chinese is the word "后". In the simplified form, it has two meanings, "behind" or "queen". When you attempt to convert this back to traditional Chinese, however, there can be more than two character choices: 後 "behind" or 后 "queen". One funny example I came across is a translator which converted "皇后大道" Queen's Road to "皇後大道", which literally means Queen's Behind Road.
Unless your translation algorithm is super smart, it is bound to produce errors. So you're better off hiring a very good translator who's fluent in both types of Chinese.
简短回答:是的。这很容易。你可以先将它从UTF-8转换为BIG5,然后有很多工具可以将BIG5转换为GBK,然后你可以将GBK转换为UTF-8。
Short answer: Yes. And it's easy. You can firstly convert it from UTF-8 to BIG5, then there are lots of tools for you to convert BIG5 to GBK, then you can convert GBK to UTF-8.
我对任何形式的中文一无所知,但是通过查看此维基百科页面<中的示例/a> 我倾向于认为自动转换是可能的,因为许多短语似乎使用相同数量的字符,甚至一些相同的字符。
我使用多字节
ord()
函数运行了快速测试,我可以没有看到任何允许在不使用(巨大?)查找转换表的情况下自动转换的模式。这可能是开始构建 LUTT 的好地方:
我得到了这个其他链接的答案似乎(在某种程度上)与我的推理一致:
I know nothing about any form of Chinese, but by looking at the examples in this Wikipedia page I'm inclined to think that automatic conversion is possible, since many of the phrases seem to use the same number of characters and even the some of the same characters.
I ran a quick test using a multibyte
ord()
function and I can't see any patterns that would allow the automatic conversion without the use of a (huge?) lookup translation table.This might be a good place to start building the LUTT:
I got to this other linked answer that seems to agree (to some degree) with my reasoning: