世界上所有语言的翻译表

发布于 2024-07-25 21:31:49 字数 109 浏览 6 评论 0原文

谁能告诉我,在哪里可以找到所有世界语言字母的翻译表,包括俄语、希腊语、泰语等? 我需要一个函数来从任何语言的文本创建精美的网址。 而且,因为我们对日语一无所知,所以我正在尝试这种方式。 谢谢你的回复

can anyone tell me, where can I find translation table for all world language letter, including russia, greek, thai etc? I need a function to create fancy url from text in any language. And, because we know nothing about for example japanese, I am trying this way. Thanks for you replies

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

冷月断魂刀 2024-08-01 21:31:50

听起来你想要的是一个 音译 表。 尝试该页面上的一些链接。 如果您只想将其用于 HTTP URL,请查看 percent-encoding

Sounds like what you want is a transliteration table. Try some of the links on that page. If you want it only for HTTP URLs, have a look at percent-encoding.

半暖夏伤 2024-08-01 21:31:50

一般来说,音译并不简单,请参阅 Unicode 音译指南。 坦率地说,您问题的答案是您要查找的表不存在。

也就是说,有一些解决方法可用,例如 Sean M. Burke 的 Unidecode Perl 模块(以及移植到 Ruby Python)。 但正如他指出的,你不会音译为泰语或日语,这样的转换可以有效地读取它们。

使用 Python 端口查看以下测试会话:

#!/usr/bin/env python
# -*- coding: utf-8 -*-
from unidecode import unidecode

hello = u"""Hello world! English 
Salut le monde! French 
Saluton Mondo! Esperanto
Sveika, pasaule! Latvian
Tere, maailm! Estonian
Merhaba dünya! Turkish 
Olá mundo! Portuguese
안녕, 세상! Korean
你好,世界! Chinese
こんにちは 世界! Japanese
ሠላም ዓለም! Amharic
哈佬世界! Cantonese
Привет, мир! Russian
Καλημέρα κόσμε! Greek
สวัสดีราคาถูก! Thai"""

lines = hello.splitlines()
samples = []

for line in lines:
  language, text = line.split()[-1], ' '.join(line.split()[:-1])
  samples.append( (language, text) )

for language, text in samples:
  print language.upper()
  print text
  print unidecode(text)
  print

输出:

ENGLISH
世界你好!
你好世界!

法语
向世界致敬!
向世界致敬!

世界语
Saluton Mondo!
致敬!

拉脱维亚语
Sveika,pasaule!
斯维卡,帕索勒!

爱沙尼亚语
是的,邮件!
那里,邮件!

土耳其语
Merhaba dünya!
梅尔哈巴俗世!

葡萄牙语
Olá mundo!
Ola mundo!

韩语
안녕,세상!
安宁,瑟桑!

中文
你好,世界!
你好,时杰!

日语
こんにちは世界!
konnitiha士杰!

阿姆哈拉语
ሠላምዓለም!
萨拉梅`啊阿拉梅!

粤语
哈佬世界!
哈老士杰!

俄语
Привет,мир!
私房,先生!

希腊语
Καλnμέρα κόσμε!
卡莱梅拉科斯梅!

泰语
สวัสดีราคาถูก!
斯瓦斯迪拉阿哈图克!

对于拉丁语言来说,它非常有用:它去除了重音符号。 除此之外,事情很快就会变得危险。

如果您比较中文和日文的示例,您会发现序列 世界 在两者中都被音译为 Shi Jie。 这是错误的——日语的“音译”(或更准确地说,“读”)应该是seikai。 俄罗斯人和希腊人还不错。 但阿姆哈拉语和泰语很糟糕——我猜那些语言流利的人甚至都看不懂。

这里的普遍问题是音译不是可以定义的,除非还考虑到特定于语言的信息,甚至确定语言也很重要:你的程序应该如何知道世界是日文还是中文?

比尝试在应用程序中强制进行黑客式音译更好的策略是首先弄清楚如何正确支持 Unicode。 如果您必须使用非拉丁脚本文本的全 ASCII 表示形式,请使用 URL 编码。

Transliteration in general is non-trivial, see the Unicode Transliteration Guidelines. The answer to your question, bluntly, is that the table you're looking for doesn't exist.

That said, there are a few work-arounds available, like Sean M. Burke's Unidecode Perl module (and ports to Ruby Python). But as he points out, you're not going to transliteration for, say, Thai or Japanese that's usefully readable from such conversion.

Take a look at the following test session using the Python port:

#!/usr/bin/env python
# -*- coding: utf-8 -*-
from unidecode import unidecode

hello = u"""Hello world! English 
Salut le monde! French 
Saluton Mondo! Esperanto
Sveika, pasaule! Latvian
Tere, maailm! Estonian
Merhaba dünya! Turkish 
Olá mundo! Portuguese
안녕, 세상! Korean
你好,世界! Chinese
こんにちは 世界! Japanese
ሠላም ዓለም! Amharic
哈佬世界! Cantonese
Привет, мир! Russian
Καλημέρα κόσμε! Greek
สวัสดีราคาถูก! Thai"""

lines = hello.splitlines()
samples = []

for line in lines:
  language, text = line.split()[-1], ' '.join(line.split()[:-1])
  samples.append( (language, text) )

for language, text in samples:
  print language.upper()
  print text
  print unidecode(text)
  print

Which outputs:

ENGLISH
Hello world!
Hello world!

FRENCH
Salut le monde!
Salut le monde!

ESPERANTO
Saluton Mondo!
Saluton Mondo!

LATVIAN
Sveika, pasaule!
Sveika, pasaule!

ESTONIAN
Tere, maailm!
Tere, maailm!

TURKISH
Merhaba dünya!
Merhaba dunya!

PORTUGUESE
Olá mundo!
Ola mundo!

KOREAN
안녕, 세상!
annyeong, sesang!

CHINESE
你好,世界!
Ni Hao ,Shi Jie !

JAPANESE
こんにちは 世界!
konnitiha Shi Jie !

AMHARIC
ሠላም ዓለም!
szalaame `aalame!

CANTONESE
哈佬世界!
Ha Lao Shi Jie !

RUSSIAN
Привет, мир!
Priviet, mir!

GREEK
Καλημέρα κόσμε!
Kalemera kosme!

THAI
สวัสดีราคาถูก!
swasdiiraakhaathuuk!

For languages that are Latin-ish in the first place, it's quite useful: it strips accent marks. Outside of those, things get dicey fast.

If you compare the Chinese and Japanese examples, you'll see that the sequence 世界 is transliterated Shi Jie in both. That's wrong -- the "transliteration" (or better, "reading") of the Japanese should be seikai. The Russian and Greek are not too bad. But Amharic and Thai are abysmal--I would guess that they're not even legible to someone who's fluent in those languages.

The general problem here is that transliteration is not something that can be defined unless language-specific information is also taken into account, and even determining language is non-trivial: how is your program supposed to know if 世界 is in Japanese or Chinese?

A better policy than trying to force hackish transliteration into your application is to figure out how to support Unicode properly in the first place. If you have to have an all-ASCII representation of non-Latin-script text, use URL encoding.

翻身的咸鱼 2024-08-01 21:31:50

没有正确理解你的问题。
您在寻找这样的东西吗?

http://www.joelonsoftware.com/articles/Unicode.html

Didn't understand your question correctly.
Are you looking for something like this?

http://www.joelonsoftware.com/articles/Unicode.html

冬天旳寂寞 2024-08-01 21:31:50

您始终可以尝试将文本转换为 iso-8859-1 (如果是在 php 中,则可以轻松使用 iconv),然后简单地替换空格和所有在 iso-8859-1 中有效但在 URL 中无效的坏字符; -)

You can always try to convert the text into iso-8859-1 (using for example iconv easily if it is in php) and then simply replace spaces and all those bad characters that are valid in iso-8859-1 but not in URL ;-)

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文