当前位置：文江博客话题详情

世界上所有语言的翻译表

发布于 2024-07-25 21:31:49 字数 109 浏览 6 评论 0原文

谁能告诉我，在哪里可以找到所有世界语言字母的翻译表，包括俄语、希腊语、泰语等？我需要一个函数来从任何语言的文本创建精美的网址。而且，因为我们对日语一无所知，所以我正在尝试这种方式。谢谢你的回复

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

冷月断魂刀 2024-08-01 21:31:50

听起来你想要的是一个音译表。尝试该页面上的一些链接。如果您只想将其用于 HTTP URL，请查看 percent-encoding。

回复收藏 0 原文

半暖夏伤 2024-08-01 21:31:50

一般来说，音译并不简单，请参阅 Unicode 音译指南。坦率地说，您问题的答案是您要查找的表不存在。

也就是说，有一些解决方法可用，例如 Sean M. Burke 的 Unidecode Perl 模块（以及移植到 Ruby Python）。但正如他指出的，你不会音译为泰语或日语，这样的转换可以有效地读取它们。

使用 Python 端口查看以下测试会话：

#!/usr/bin/env python
# -*- coding: utf-8 -*-
from unidecode import unidecode

hello = u"""Hello world! English 
Salut le monde! French 
Saluton Mondo! Esperanto
Sveika, pasaule! Latvian
Tere, maailm! Estonian
Merhaba dünya! Turkish 
Olá mundo! Portuguese
안녕, 세상! Korean
你好，世界！ Chinese
こんにちは 世界! Japanese
ሠላም ዓለም! Amharic
哈佬世界! Cantonese
Привет, мир! Russian
Καλημέρα κόσμε! Greek
สวัสดีราคาถูก! Thai"""

lines = hello.splitlines()
samples = []

for line in lines:
  language, text = line.split()[-1], ' '.join(line.split()[:-1])
  samples.append( (language, text) )

for language, text in samples:
  print language.upper()
  print text
  print unidecode(text)
  print

输出：

ENGLISH
世界你好！
你好世界！

法语
向世界致敬！
向世界致敬！

世界语
Saluton Mondo！
致敬！

拉脱维亚语
Sveika，pasaule！
斯维卡，帕索勒！

爱沙尼亚语
是的，邮件！
那里，邮件！

土耳其语
Merhaba dünya！
梅尔哈巴俗世！

葡萄牙语
Olá mundo！
Ola mundo！

韩语
안녕，세상！
安宁，瑟桑！

中文
你好，世界！
你好，时杰！

日语
こんにちは世界！
konnitiha士杰！

阿姆哈拉语
ሠላምዓለም！
萨拉梅`啊阿拉梅！

粤语
哈佬世界！
哈老士杰！

俄语
Привет，мир！
私房，先生！

希腊语
Καλnμέρα κόσμε！
卡莱梅拉科斯梅！

泰语
สวัสดีราคาถูก！
斯瓦斯迪拉阿哈图克！

对于拉丁语言来说，它非常有用：它去除了重音符号。除此之外，事情很快就会变得危险。

如果您比较中文和日文的示例，您会发现序列 世界 在两者中都被音译为 Shi Jie。这是错误的——日语的“音译”（或更准确地说，“读”）应该是seikai。俄罗斯人和希腊人还不错。但阿姆哈拉语和泰语很糟糕——我猜那些语言流利的人甚至都看不懂。

这里的普遍问题是音译不是可以定义的，除非还考虑到特定于语言的信息，甚至确定语言也很重要：你的程序应该如何知道世界是日文还是中文？

比尝试在应用程序中强制进行黑客式音译更好的策略是首先弄清楚如何正确支持 Unicode。如果您必须使用非拉丁脚本文本的全 ASCII 表示形式，请使用 URL 编码。

Transliteration in general is non-trivial, see the Unicode Transliteration Guidelines. The answer to your question, bluntly, is that the table you're looking for doesn't exist.

That said, there are a few work-arounds available, like Sean M. Burke's Unidecode Perl module (and ports to Ruby Python). But as he points out, you're not going to transliteration for, say, Thai or Japanese that's usefully readable from such conversion.

Take a look at the following test session using the Python port:

#!/usr/bin/env python
# -*- coding: utf-8 -*-
from unidecode import unidecode

hello = u"""Hello world! English 
Salut le monde! French 
Saluton Mondo! Esperanto
Sveika, pasaule! Latvian
Tere, maailm! Estonian
Merhaba dünya! Turkish 
Olá mundo! Portuguese
안녕, 세상! Korean
你好，世界！ Chinese
こんにちは 世界! Japanese
ሠላም ዓለም! Amharic
哈佬世界! Cantonese
Привет, мир! Russian
Καλημέρα κόσμε! Greek
สวัสดีราคาถูก! Thai"""

lines = hello.splitlines()
samples = []

for line in lines:
  language, text = line.split()[-1], ' '.join(line.split()[:-1])
  samples.append( (language, text) )

for language, text in samples:
  print language.upper()
  print text
  print unidecode(text)
  print

Which outputs:

ENGLISH
Hello world!
Hello world!

FRENCH
Salut le monde!
Salut le monde!

ESPERANTO
Saluton Mondo!
Saluton Mondo!

LATVIAN
Sveika, pasaule!
Sveika, pasaule!

ESTONIAN
Tere, maailm!
Tere, maailm!

TURKISH
Merhaba dünya!
Merhaba dunya!

PORTUGUESE
Olá mundo!
Ola mundo!

KOREAN
안녕, 세상!
annyeong, sesang!

CHINESE
你好，世界！
Ni Hao ,Shi Jie !

JAPANESE
こんにちは世界!
konnitiha Shi Jie !

AMHARIC
ሠላም ዓለም!
szalaame `aalame!

CANTONESE
哈佬世界!
Ha Lao Shi Jie !

RUSSIAN
Привет, мир!
Priviet, mir!

GREEK
Καλημέρα κόσμε!
Kalemera kosme!

THAI
สวัสดีราคาถูก!
swasdiiraakhaathuuk!

For languages that are Latin-ish in the first place, it's quite useful: it strips accent marks. Outside of those, things get dicey fast.

If you compare the Chinese and Japanese examples, you'll see that the sequence 世界 is transliterated Shi Jie in both. That's wrong -- the "transliteration" (or better, "reading") of the Japanese should be seikai. The Russian and Greek are not too bad. But Amharic and Thai are abysmal--I would guess that they're not even legible to someone who's fluent in those languages.

The general problem here is that transliteration is not something that can be defined unless language-specific information is also taken into account, and even determining language is non-trivial: how is your program supposed to know if 世界 is in Japanese or Chinese?

A better policy than trying to force hackish transliteration into your application is to figure out how to support Unicode properly in the first place. If you have to have an all-ASCII representation of non-Latin-script text, use URL encoding.

回复收藏 0 原文