单数/复数单词的本地化 - 语法数字有哪些不同的语言规则?
我一直在开发一个 .NET 字符串格式化库来协助应用程序的本地化。它称为 SmartFormat,在 GitHub 上开源。
它试图解决的问题之一是语法数字。这也称为“单数和复数形式”或“条件格式”,下面是它的英语片段:
var message = "There {0:is|are} {0} {0:item|items} remaining";
// You can use the Smart.Format method just like using String.Format:
var output = Smart.Format(CultureInfo.CurrentUICulture, message, items.Count);
我相信您知道,英语规则是有两种形式(单数和复数)复数)可应用于名词、动词和形容词。如果数量为1
,则使用单数,否则使用复数。
我现在正在尝试通过为其他语言实施正确的规则来“拓宽我的视野”!我逐渐了解到,有些语言最多可以有 4 个复数形式,并且需要一些逻辑来确定正确的形式。我想扩展我的代码以适应多种语言。例如,我听说俄语、波兰语和土耳其语的规则与英语有很大不同,所以这可能是一个很好的起点。
但是,我只会说英语和西班牙语,那么如何确定许多常见语言的正确语法规则呢?
编辑:我还想知道一些适合我的单元测试的好的非英语“测试短语”:有哪些好的非英语短语具有单数和复数形式,可用于测试国际化和本土化图书馆?
I have been developing a .NET string formatting library to assist with localization of an application. It's called SmartFormat and is open-source on GitHub.
One of the issues it tries to address is Grammatical Numbers. This is also known as "singular and plural forms" or "conditional formatting", and here's a snippet of what it looks like in English:
var message = "There {0:is|are} {0} {0:item|items} remaining";
// You can use the Smart.Format method just like using String.Format:
var output = Smart.Format(CultureInfo.CurrentUICulture, message, items.Count);
The English rule, as I'm sure you know, is that there are 2 forms (singular and plural) that can apply to nouns, verbs, and adjectives. If the quantity is 1
then singular is used, otherwise the plural is used.
I am now trying to "broaden my horizons" by implementing the correct rules for other languages! I have come to understand that some languages can have up to 4 plural forms, and it takes some logic to determine the correct form. I would like to expand my code to accomodate multiple languages. For example, I've heard that Russian, Polish, and Turkish, have pretty different rules than English, so that might be a great starting point.
However, I only speak English and Spanish, so how can I determine the correct grammatical rules for many common languages?
Edit: I also would like to know some good non-English "test phrases" for my unit tests here: What are some good non-English phrases with singular and plural forms that can be used to test an internationalization and localization library?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
当然,不同的语言有不同的复数规则。特别有趣的是阿拉伯语和波兰语,它们都包含相当多的复数形式。
如果您想了解更多关于这些规则的信息,请访问Unicode Common Locale Data Repository,即语言复数规则。
那里有很多有趣的信息,不幸的是其中一些信息是错误的。我希望复数形式是正确的(至少对于波兰语来说是正确的,据我所知:))。
Definitely, different languages have different pluralization rules. Especially interesting could be Arabic and Polish both of which contain quite a few plural forms.
If you want to learn more about these rules, please visit Unicode Common Locale Data Repository, namely Language Plural Rules.
There are quite a few interesting information there, unfortunately some of them are unfortunately wrong. I hope plural forms are correct (at least for Polish they are, as far as I could tell :) ).
如果您在问题正文中提供您正在使用的规则示例,那就太好了,它们采用什么格式?
无论如何,在您的示例中:
您似乎基于这样的假设:两个选择段中的选择都基于相同的单个规则,并且两个选择之间存在直接对应关系 - 即会选择相同的单个规则(是,项目)或(是,项目)。
这种假设对于其他语言不一定正确,例如虚构的语言 English-ez (只是为了让读者更容易理解,我发现外语中的例子令人恼火 - 我借用了阿拉伯语,但简化了很多) 。该语言的规则如下:
第一个选择段与普通英语相同:
第二个选择段与普通英语有不同的规则,假设以下简单规则:
现在单一规则解决方案是不够的 - 我们可以建议采用不同的形式:
此解决方案在其他情况下可能会出现问题,但我们正在讨论您提供的示例。
检查 gettext(允许选择单个级别的完整消息)和 ICU(允许选择多个级别的完整消息,即多个变量)。
It would be nice if you provided in the question body a sample of the rules that you're using, what format do they take?
Anyway, in your example:
you seem to be basing on the assumption that the selection in both choice segments is based on the same single rule, and that there is direct correspondence between the two choices - that is the same single rule would choose (is,item) or (are,items).
This assumption is not necessarily correct for other languages, take for example the fictitious language English-ez (just to make things easier to understand for the reader, I find examples in foreign languages irritating - I'm borrowing from Arabic but simplifying a lot). The rules for this language are as follows:
The first selection segment is the same as normal English:
The second selection segment has a different rule from normal English, assume the following simple rule:
Now the single rule solution would not be adequate - we can suggest a different form:
This solution might have problems in other situations, but we are discussing the example you provided.
Check gettext (allows selection of full message to a single level) and ICU (allows selection of full message to multiple levels ie on multiple variables).
您所采取的方法可能适用于英语和西班牙语的大多数情况,但很可能不适用于许多其他语言。问题是你只有一种模式可以尝试解决所有语法数字。
每种语法性别都需要一个模式。在这里,我将两个模式组合在一起形成一个多模式字符串。
英语使用两种语法数字:单数和复数。一个开始单数模式,另一个开始复数模式。
例如,当翻译成使用相同数量语法数字的芬兰语时,您将使用
但是日语仅使用一个语法数字,因此日语只会使用其他语法数字。波兰语使用三个语法数字,因此它包含一、少和多。
其次,您需要适当的规则来选择正确的模式数量的多个模式。 Unicode 联盟的 CLDR 将规则包含在 XML 文件中。
我实现了一个开源库,它使用 CLDR 规则(从 XML 转换为 C# 代码并包含到库中)和多模式字符串来支持语法数字和语法性别。
https://github.com/jaska45/I18N
使用此库,您的示例将变成
The approach you have taken might work on most cases in English and Spanish but most likely fails on many other languages. The problem is that you only have one pattern that tries to solve all grammatical numbers.
You need one pattern for each grammatical gender. Here I have combined two patterns together into a single multi pattern string.
English uses two grammatical number: singular and plural. one starts singular pattern and other starts plural pattern.
When translated for example to Finnish that uses the same amount of grammatical numbers you would use
However Japanese use only one grammatical number so Japanese would only use other. Polish uses three grammatical numbers so it would contains one, few and many.
Secondly you would need the proper rules to choose the right pattern amount multiple patterns. Unicode consortium's CLDR contains the rules in XML file.
I have implemented an open source library that uses CLDR rules (converted from XML into C# code and included into the library) and multi patterns strings to support both grammatical numbers and grammatical genders.
https://github.com/jaska45/I18N
Using this library your samples turns into