简体中文到繁体中文的转换

发布于 2024-11-07 03:58:57 字数 230 浏览 4 评论 0 原文

如果一个网站使用简体中文翻译进行本地化/国际化......

  • 是否可以可靠地 自动将文本转换为 传统中国,高品质 方式?
  • 如果是这样,它的质量会非常高,还是只是一个翻译人员调整的良好起点?
  • 是否有开源工具(最好是 PHP)可以做 这样的转换?
  • 这种转换方式是否比另一种方式更好(简化 -> 传统,或反之亦然)?

If a website is localized/internationalized with a Simplified Chinese translation...

  • Is it possible to reliably
    automatically convert the text to
    Traditional Chinese in a high quality
    way?
  • If so, is it going to be extremely high quality or just a good starting point for a translator to tweak?
  • Are there open source tools (ideally in PHP) to do
    such a conversion?
  • Is the conversion better one way vs. the other (simplified -> traditional, or vice versa)?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

陈年往事 2024-11-14 03:58:57

简短的回答:不,不可靠+高质量。我不会推荐自动化工具,除非市场对你来说不是那么重要并且你可能会冒某些公开尴尬的失误的风险。您可能会发现一些本地化公司更愿意从高质量的简体中文翻译开始并将其改编为繁体,但您也可能会发现许多公司更喜欢从英文源开始。

更长的答案:在某些情况下,只有字形不同,并且它们具有不同的 unicode 代码点。但中国大陆和台湾/香港之间也存在一些惯用语和词汇差异,如果不处理好这些差异,您的质量就会受到影响。技术术语的问题可能较多或较少,具体取决于这些术语普遍使用的时代。其中一些问题可能会被自动化工具捕获,但不是全部。当然,如果您选择自动转换事物的路线,请确保您得到每个目标市场的 QA 团队的认可。

此外,还存在社会政治问题。例如,您可以在台湾使用“中华民国”之类的术语,但如果它出现在您的简体中文版本(有时是英文版本)中,这将极大地激怒中国政府;如果您在中国有实际的子公司或合作伙伴,其员工可能仅因颠覆性术语而被捕。 (这并非中国独有;巴基斯坦/印度和土耳其也有类似的问题)。如果将“台湾”称为“国家”,你也会遇到类似的麻烦。

Short answer: No, not reliably+high quality. I wouldn't recommend automated tools unless the market isn't that important to you and you can risk certain publicly embarrassing flubs. You may find some localization firms are happier to start with a quality simplified Chinese translation and adapt it to traditional, but you may also find that many companies prefer to start with the English source.

Longer answer: There are some cases where only the glyphs are different, and they have different unicode code points. But there are also some idiomatic and vocabulary differences between the PRC and Taiwan/Hong Kong, and your quality will suffer if these aren't handled. Technical terms may be more problematic or less, depending on the era in which the terms became commonly used. Some of these issues may be caught by automated tools, but not all of them. Certainly, if you go the route of automatically converting things, make sure you get buyoff from QA teams based in each of your target markets.

Additionally, there are sociopolitical concerns as well. For example, you can use terms like "Republic of China" in Taiwan, but this will royally piss off the Chinese government if it appears in your simplified Chinese version (and sometimes your English version); if you have an actual subsidiary or partner in China, the staff may be arrested solely on the basis of subversive terminology. (This is not unique to China; Pakistan/India and Turkey have similar issues). You can get into similar trouble by referring to "Taiwan" as a "country."

人生百味 2024-11-14 03:58:57

作为一个土生土长的香港人,我同意@JasonTrue:不要这样做。您可能会激怒和冒犯台湾和香港的潜在用户。

但是,如果您仍然坚持这样做,请查看维基百科是如何做到的此处是一种实现(注意许可证)。

As a native Hong Konger myself, I concur with @JasonTrue: don't do it. You risk angering and offending your potential users in Taiwan and Hong Kong.

BUT, if you still insist on doing so, have a look at how Wikipedia does it; here is one implementation (note license).

捎一片雪花 2024-11-14 03:58:57

是否有可能以高质量的方式可靠地将文本自动转换为繁体中文?

其他答案都集中在困难上,但这些都被夸大了。有一件事是,很大一部分角色是完全相同的。第二件事是“简化”形式正是:繁体字的简化形式。这意味着繁体字和简体字之间大多存在一一对应的关系。

如果是这样,它的质量会非常高,还是只是一个供译者调整的良好起点?

有些事情需要调整。

是否有开源工具(最好是 PHP)可以进行此类转换?

据我所知,尽管您可能想查看谷歌翻译API?

一种方式的转换是否比另一种方式更好(简化 -> 传统,或反之亦然)?

一些字符在简化的字母表中失去了区别。例如,面(面粉)被简化为与面(脸、侧面)相同的字符。因此,传统->简化会稍微更准确。

我还想指出,繁体字不仅仅在台湾使用(它们可以在香港找到,有时甚至在大陆也可以找到)


我能够找到 这个这个。不过需要创建一个帐户才能下载。我自己从未使用过该网站,因此我无法保证。

Is it possible to reliably automatically convert the text to Traditional Chinese in a high quality way?

Other answers are focused on the difficulties, but these are exaggerated. One thing is that a substantial portion of the characters are exactly the same. The second thing is the 'simplified' forms are exactly that: simplified forms of the traditional characters. That means mostly there is a 1 to 1 relationship between traditional and simplified characters.

If so, is it going to be extremely high quality or just a good starting point for a translator to tweak?

A few things will need tweaking.

Are there open source tools (ideally in PHP) to do such a conversion?

Not that I am aware of, though you might want to check out the google translate api?

Is the conversion better one way vs. the other (simplified -> traditional, or vice versa)?

A few characters lost distinction in the simplified alphabet. For instance 麵(flour) was simplified to the same character as 面(face, side). For this reason traditional->simplified would be slightly more accurate.

I'd also like to point out that traditional characters are not solely in use in Taiwan (They can be found in HK and occasionally even in the mainland)


I was able to find this and this. Need to create an account to download, though. Never used the site myself so I cannot vouch for it.

長街聽風 2024-11-14 03:58:57

从根本上来说,简体中文单词有很多缺失的含义。世界上没有一种编程语言能够准确地将简体中文转换为繁体中文。你只会给你的目标受众(香港、澳门、台湾)带来困惑。

从简体中文到繁体中文的失败翻译的一个完美例子就是“后”字。在简化形式中,它有两个含义,“后面”或“女王”。然而,当您尝试将其转换回繁体中文时,可能有两个以上的字符选择:后“behind”或后“queen”。我遇到的一个有趣的例子是,一位翻译将“皇后大道”Queen's Road 改成了“皇后大道”,字面意思是 Queen's Behind Road。

除非你的翻译算法超级聪明,否则它必然会产生错误。因此,您最好聘请一位精通两种中文的优秀翻译。

Fundamentally, simplified Chinese words have a lot of missing meanings. No programming language in the world will be able to accurately convert simplified Chinese into traditional Chinese. You will just cause confusion for your intended audience (Hong Kong, Macau, Taiwan).

A perfect example of failed translation from simplified Chinese to traditional Chinese is the word "后". In the simplified form, it has two meanings, "behind" or "queen". When you attempt to convert this back to traditional Chinese, however, there can be more than two character choices: 後 "behind" or 后 "queen". One funny example I came across is a translator which converted "皇后大道" Queen's Road to "皇後大道", which literally means Queen's Behind Road.

Unless your translation algorithm is super smart, it is bound to produce errors. So you're better off hiring a very good translator who's fluent in both types of Chinese.

指尖微凉心微凉 2024-11-14 03:58:57

简短回答:是的。这很容易。你可以先将它从UTF-8转换为BIG5,然后有很多工具可以将BIG5转换为GBK,然后你可以将GBK转换为UTF-8。

Short answer: Yes. And it's easy. You can firstly convert it from UTF-8 to BIG5, then there are lots of tools for you to convert BIG5 to GBK, then you can convert GBK to UTF-8.

宣告ˉ结束 2024-11-14 03:58:57

我对任何形式的中文一无所知,但是通过查看此维基百科页面<中的示例/a> 我倾向于认为自动转换是可能的,因为许多短语似乎使用相同数量的字符,甚至一些相同的字符。

我使用多字节 ord() 函数运行了快速测试,我可以没有看到任何允许在不使用(巨大?)查找转换表的情况下自动转换的模式。

Traditional Chinese 漢字
Simplified Chinese  汉字

function mb_ord($string)
{
    if (is_array($result = unpack('N', iconv('UTF-8', 'UCS-4BE', $string))) === true)
    {
        return $result[1];
    }

    return false;
}

var_dump(mb_ord('漢'), mb_ord('字')); // 28450, 23383
var_dump(mb_ord('汉'), mb_ord('字')); // 27721, 23383

这可能是开始构建 LUTT 的好地方:

我得到了这个其他链接的答案似乎(在某种程度上)与我的推理一致:

有几个国家
汉语是主要书面语言。
它们之间的主要区别是
他们是否使用简化或
传统字符,但是有
还有细微的地区差异
(在
词汇等)。

I know nothing about any form of Chinese, but by looking at the examples in this Wikipedia page I'm inclined to think that automatic conversion is possible, since many of the phrases seem to use the same number of characters and even the some of the same characters.

I ran a quick test using a multibyte ord() function and I can't see any patterns that would allow the automatic conversion without the use of a (huge?) lookup translation table.

Traditional Chinese 漢字
Simplified Chinese  汉字

function mb_ord($string)
{
    if (is_array($result = unpack('N', iconv('UTF-8', 'UCS-4BE', $string))) === true)
    {
        return $result[1];
    }

    return false;
}

var_dump(mb_ord('漢'), mb_ord('字')); // 28450, 23383
var_dump(mb_ord('汉'), mb_ord('字')); // 27721, 23383

This might be a good place to start building the LUTT:

I got to this other linked answer that seems to agree (to some degree) with my reasoning:

There are several countries where
Chinese is the main written language.
The major difference between them is
whether they use simplified or
traditional characters, but there are
also minor regional differences
(in
vocabulary, etc).

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文