我正在开发一个大型多语言网站,并且正在考虑采用不同的方法使其成为多语言网站。我能想到的可能的替代方案是:
- Gettext 函数与生成.po 文件的
- 一个 MySQL 表,其中包含翻译和每个文本的唯一字符串 ID
- PHP 文件,其中包含包含具有唯一字符串 ID 的不同翻译的数组
据我所知,Gettext 函数应该是最有效的,但我的要求是应该可以更改原始参考语言(英语)的文本字符串,而不会因为几个单词的更改而将该字符串的其他翻译自动恢复为英语。 Gettext 可以做到这一点吗?
资源需求最少的解决方案是什么?
使用 Gettext 函数或带有数组的 PHP 文件是否或多或少同样需要资源?
对于更有效的解决方案还有其他建议吗?
I am working on a large multilingual website and I am considering different approaches for making it multilingual. The possible alternatives I can think of are:
- The Gettext functions with generation of .po files
- One MySQL table with the translations and a unique string ID for each text
- PHP-files with arrays containing the different translations with unique string IDs
As far as I have understood the Gettext functions should be most efficient, but my requirement is that it should be possible to change a text string in the original reference language (English) without the other translations of that string automatically reverting back to English just because a couple of words changed. Is this possible with Gettext?
What is the least resource demanding solution?
Is using the Gettext functions or PHP files with arrays more or less equally resource demanding?
Any other suggestions for more efficient solutions?
发布评论
评论(3)
一些注意事项:
1。翻译
谁将进行翻译?也连接到该网站的人?翻译机构?使用 Gettext 时,您将使用“pot”(.po) 文件。这些文件包含消息 ID 和消息字符串(翻译)。示例:
现在,对于任何需要翻译它的人来说,这看起来都很好并且可以理解。但是,当您像迈克建议的那样使用关键字而不是完整的句子时会发生什么?如果有人需要翻译名为“address_home”的 msgid,他或她不知道这是否应该是标题“家庭地址”还是一个完整的句子。在这种情况下,请确保在调用 gettext 函数之前向文件添加注释,如下所示:
在创建 .po 文件时使用
xgettext --add-comments=///
将添加这些评论。但是,我不认为 Gettext 适合这样使用。另外,如果您需要为要显示的每个文本添加注释,您将a)可能在某些时候犯错误,b)您的整个脚本将充满文本,仅以注释形式,c) 注释需要直接放在 Gettext 函数上方,这并不总是方便,具体取决于函数在代码中的位置。2.维护
一旦您的网站增长(甚至进一步增长)并且您的语言文件也随之增长,以这种方式维护所有不同的翻译可能会变得相当困难。每次添加文本时,您都需要创建新文件,将文件发送给翻译人员,接收返回的文件,确保结构仍然完好无损(热心的翻译人员总是很乐意翻译语法,从而使整个文件无法使用:)),然后完成导入新翻译。当然,这是可行的,但要注意大型网站和许多不同语言在这方面可能出现的问题。
Another option: combine your 2nd and 3rd alternative:
就我个人而言,我发现使用(简单的)CMS 管理翻译、将变量和翻译保存在数据库中并自行将相关文本导出到语言文件更有用:
text('var');
或者类似__('faq','register','lost_password_text');
第 3 点可以很简单,只需从数据库中选择所有相关变量和翻译,将它们放入数组中并将序列化数组写入文件即可。
优点:
维护。对于大型项目来说,维护文本会容易得多。您可以按页面、部分或站点内的其他部分对变量进行分组,只需向数据库添加一列来定义该变量属于站点的哪个部分即可。这样您就可以快速获取常见问题解答页面中使用的所有变量的列表。
翻译。您可以在单个页面上显示带有所有不同语言的所有翻译的变量。这对于可以同时将文本翻译成多种语言的人来说可能很有用。查看其他翻译以了解上下文可能会很有用,以便翻译尽可能好。您还可以查询数据库以了解哪些内容已翻译,哪些内容尚未翻译。也许添加时间戳来跟踪可能过时的翻译。
访问。这取决于谁来翻译。您可以通过简单的登录来包装 CMS,以便在需要时向翻译机构的人员授予访问权限,并且只允许他们更改某些语言甚至网站的某些部分。如果这不是一个选项,您仍然可以将数据输出到一个可以手动翻译并稍后导入的文件(尽管这可能会出现与前面提到的相同的问题)。您可以添加已有的翻译(英语或其他主要语言)作为翻译人员的上下文。
总而言之,我认为您会发现通过这种方式您可以更好地控制翻译,尤其是从长远来看。与本机 gettext 函数相比,我无法告诉您有关此方法的速度或效率的任何信息。但是,根据语言文件的大小,我认为这不会有很大的差异。如果按页面或部分对变量进行分组,则始终可以仅包含所需的部分。
A few considerations:
1. Translations
Who will be doing the translations? People that are also connected to the site? A translation agency? When using Gettext you'll be working with 'pot' (.po) files. These files contain the message ID and the message string (the translation). Example:
Now, this looks just fine and understandable for anyone who needs to translate this. But what happens when you use keywords, like Mike suggests, instead of full sentences? If someone needs to translate a msgid called "address_home", he or she has no clue if this is should be a header "Home address" or that it's a full sentence. In this case, make sure to add comments to the file right before you call on the gettext function, like so:
Using
xgettext --add-comments=///
when creating the .po files will add these comments. However, I don't think Gettext is ment to be used this way. Also, if you need to add comments with every text you want to display you'll a) probably make an error at some point, b) you're whole script will be filled with the texts anyway, only in comment form, c) the comments needs to be placed directly above the Gettext function, which isn't always convient, depending on the position of the function in your code.2. Maintenance
Once your site grows (even further) and your language files along with it, it might get pretty hard to maintain all the different translations this way. Every time you add a text, you need to create new files, send out the files to translators, receive the files back, make sure the structure is still intact (eager translators are always happy to translate the syntax as well, making the whole file unusable :)), and finish with importing the new translations. It's doable, sure, but be aware with possible problems on this end with large sites and many different languages.
Another option: combine your 2nd and 3rd alternative:
Personally, I find it more useful to manage the translation using a (simple) CMS, keeping the variables and translations in a database and export the relevent texts to language files yourself:
text('var');
or maybe something like__('faq','register','lost_password_text');
Point 3 can be as simple as selecting all the relevant variables and translations from the database, putting them in an array and writing the serlialized array to a file.
Advantages:
Maintenance. Maintaining the texts can be a lot easier for big projects. You can group variables by page, sections or other parts within your site, by simply adding a column to your database that defines to which part of the site this variable belongs. That way you can quickly pull up a list of all the variables used in e.g. the FAQ page.
Translating. You can display the variable with all the translations of all the different languages on a single page. This might be useful for people who can translate texts into multiple languages at the same time. And it might be useful to see other translations to get a feel for the context so that the translation is as good as possible. You can also query the database to find out what has been translated and what hasn't. Maybe add timestamps to keep track of possible outdated translations.
Access. This depends on who will be translating. You can wrap the CMS with a simple login to grant access to people from a translation agency if need be, and only allow them to change certain languages or even certain parts of the site. If this isn't an option you can still output the data to a file that can be manually translated and import it later (although this might come with the same problems as mentioned before.). You can add one of the translations that's already there (English or another main language) as context for the translator.
All in all I think you'll find that you'll have a lot more control over the translations this way, especially in the long run. I can't tell you anything about speed or efficiency of this approach compared to the native gettext function. But, depending on the size of the language files, I don't think it'll be a big difference. If you group the variables by page or section, you can alway include only the required parts.
经过一些测试后,我最终决定或多或少地采用亚历克斯结合第二和第三种选择的台词。
Gettext问题
我尝试先设置整个 gettext-system 来尝试一下,但结果比我想象的要复杂得多。问题在于 Windows 和 Unix 系统使用setlocale() 的不同语言简称。目前,我正在 Windows 上使用 Wamp 运行我的开发服务器,而最终站点将在Linux上运行。在我经历了一些 十多个 指南, < a href="http://www.aota.net/forums/showthread.php?threadid=10615" rel="nofollow noreferrer">论坛,questions 等,并在每次修改后重新启动服务器。我无法以任何看起来简单的方式正确设置它。另外 gettext 不是线程安全的,要更新语言文件,需要重新启动服务器或 需要使用 hack,没有简单的方法可以处理不同版本的语言文件或在不修改源代码或使用 Mikes 建议的情况下处理原始英文文本,正如 Alec 指出的那样最佳的。
解决方案
所以我最终根据 Alecs 的回复得出了我认为最好的解决方案:
性能
我从来没有抽出时间来测试 gettext,但根据 Mike 发布的链接 使用数组和 gettext 之间的性能差异对我来说是完全可以接受的,因为自定义系统提供了如上所述的好处。但是,我比较了使用数组中包含 20 个已翻译文本字符串的数组与从 MySQL 检索相同的 20 个文本字符串数据库。事实证明,使用文件中包含的数组比从 MySQL DB 同时检索所有 20 个字符串要快 6 倍。这不是真正的科学基准,结果肯定会因不同的系统和设置而异,但它清楚地显示了我的预期 - 使用数据库比直接使用数组要慢得多,这就是我选择生成缓存的原因- 阵列文件而不是使用数据库。
作为比较,我还测试了仅输出具有相同文本的简单回显的速度。事实证明,它比使用包含文件中的数组快大约 20 倍,但是,如果没有针对不同语言的不同版本的页面,就不可能进行翻译,这违背了动态页面的目的。那么最好也使用一个好的缓存系统。
性能测试源文件:
PHP:http://pastie.org/964082
MySQL 表:http://pastie.org/964115
它当然并不完美,但至少创造了关于性能差异的想法。
After some testing I finally decided to go more or less with the lines of Alecs' combination of the second and third alternative.
Gettext problem
I tried to set up the whole gettext-system first to try it out, but it turned out to be much more complicated then I thought. The problem is that Windows and Unix systems use different language shortnames for setlocale(). For the moment I'm running my dev-server on Windows with Wamp, while the final site will run on Linux. After I went through a couple of dozen guides, forums, questions etc. and restarting the server after each modification. I couldn't get it setup properly in any easy way it seemed. Additionally gettext is not threadsafe, to update the language file the server needs to be restarted or a hack needs to be used, there is no easy way of handling different versions of language files or handling the original English text without modifying the source or using Mikes suggestion, which as Alec pointed out isn't optimal.
Solution
So I ended up with what I think is the best solution based on Alecs response:
Performance
I never got around to test gettext, but according to the link Mike posted the difference in performance between using an array and gettext is totally acceptable for me for the benefits which a custom system gives as described above. However, I compared using an array with 20 translated text-strings in an array compared to retrieving the same 20 text-strings from a MySQL DB. It turned out that using an array included from a file was aeound 6 times faster than retrieving all the 20 strings at the same time from the MySQL DB. It was no really scientific benchmark and the results may surely vary on different systems and setups, but it clearly shows exactly what I expected - that it would be much slower using a DB than using an array directly, which is why I choose to generate cache-files for the array instead of using the DB.
As a comparison I also tested how fast it was to only output simple echos with the same text. It turned out to be around 20 times faster than using arrays from an included file, but well - then it is not possible to translate without having different versions of the page for different languages, which defies the purpose of dynamic pages. Then it is better to also use a good cachesystem.
Performance test source files:
PHP: http://pastie.org/964082
MySQL table: http://pastie.org/964115
It is surely not perfect, but at least creates an idea about the performance differences.
您不必使用英文文本作为键,您可以任意执行此操作,但也可以提供英文翻译,即
gettext 键是“hello”,
然后您可以得到该文本的各种语言翻译以及该文本的英文翻译,这也是“hello”,然后,如果您想更新字符串的英文版本,您可以保留密钥,只更新英文翻译。
Rather than having to use the English text as the keys you could arbitrarily do this but also provide english translations i.e.
gettext key is 'hello'
You then have your various language translations of this and an english translation of this that is also 'hello', then if you want to update the english version of the string you can leave the key alone and just update the english translation.