将 gettext 翻译语言与原始语言切换

发布于 2024-10-12 07:43:28 字数 1708 浏览 8 评论 0原文

我开始使用所有德语文本的 PHP 应用程序,然后使用 gettext 提取所有字符串并将其翻译为英语。
所以,现在我有一个 .po 文件,其中包含德语的所有 msgids 和英语的 msgstrs。我想切换它们,以便我的源代码包含英语作为 msgids,主要有两个原因:

  1. 更多翻译人员会懂英语,因此只适合为他们提供带有 msgids 的文件 英文。我可以总是可以在发出文件之前和收到文件之后切换文件,但是不。
  2. 它会帮助我写英语对象和对象。如果内容文本也是英文,则函数名称和注释。我想这样做,因此该项目对其他开源合作者更加开放(比德语更可能懂英语)。

我可以手动完成此任务,我预计这种任务将花费我更多的时间来为其编写一个自动化例程(因为我对 shell 脚本非常很糟糕),而不是通过手。但我也预计会像往常一样鄙视手动计算机劳动的每一分钟(感觉像是矛盾,对吧?)。

以前有人这样做过吗?我认为这将是一个常见问题,但找不到任何东西。非常感谢。

示例问题:

<title><?=_('Routinen')?></title>

#: /users/ruben/sites/v/routinen.php:43
msgid "Routinen"
msgstr "Routines"

我想我可以缩小问题范围。 .po 文件中的开关当然不是问题,它很简单,

preg_replace('/msgid "(.+)"\nmsgstr "(.+)"/', '/msgid "$2"\nmsgstr "$1"/', $str);

对我来说问题是在我的项目文件夹文件中搜索 _('$msgid') 并替换 < code>_('msgstr') 在解析 .po 文件时(这可能不是最优雅的方式,毕竟 .po 文件包含包含 msgid 出现的所有文件路径的注释)。


闲逛和akirk的回答之后,我遇到了更多问题。

  1. 因为我混合使用了 _('xxx')_("xxx") 调用,所以我必须小心(un)转义。
    • msgids 和 msgstrs 中的双引号 " 必须不转义,但斜杠不能去掉,因为双引号可能在 PHP 中也被转义了
    • 单引号在替换为 PHP 时必须进行转义,但随后也必须在 .po 文件中进行更改。对我来说幸运的是,单引号仅出现在英文文本中。
  2. 而 msgstr 可以有多行,然后它们看起来像这样
    msgid = ""
    “第 1 行\n”
    “第 2 行\n”
    msgstr = ""
    “第 1 行\n”
    "line 2\n"
  3. 目前当然会跳过复数形式,但在我的情况下,这不是问题
  4. poedit 希望删除似乎已成功切换的过时字符串,我不知道为什么会发生这种情况在(很多)情况下。

今晚我必须停止处理这件事。尽管如此,使用解析器而不是正则表达式似乎并不会太过分。

I started my PHP application with all text in German, then used gettext to extract all strings and translate them to English.
So, now I have a .po file with all msgids in German and msgstrs in English. I want to switch them, so that my source code contains the English as msgids for two main reasons:

  1. More translators will know English, so it is only appropriate to serve them up a file with msgids in English. I could always switch the file before I give it out and after I receive it, but naaah.
  2. It would help me to write English object & function names and comments if the content text was also English. I'd like to do that, so the project is more open to other Open Source collaborators (more likely to know English than German).

I could do this manually and this is the sort of task where I anticipate it will take me more time to write an automated routine for it (because I'm very bad with shell scripts) than do it by hand. But I also anticipate despising every minute of manual computer labour (feels like an oxymoron, right?) like I always do.

Has someone done this before? I figured this would be a common problem, but couldn't find anything. Many thanks ahead.

Sample Problem:

<title><?=_('Routinen')?></title>

#: /users/ruben/sites/v/routinen.php:43
msgid "Routinen"
msgstr "Routines"

I thought I'd narrow the problem down. The switch in the .po-file is no issue of course, it is as simple as

preg_replace('/msgid "(.+)"\nmsgstr "(.+)"/', '/msgid "$2"\nmsgstr "$1"/', $str);

The problem for me is the routine that searches my project folder files for _('$msgid') and substitutes _('msgstr') while parsing the .po-file (which is probably not even the most elegant way, after all the .po-file contains comments which contain all file paths where the msgid occurs).


After fooling around with akirk's answer a little, I ran into some more problems.

  1. Because I have a mixture of _('xxx') and _("xxx") calls, I have to be careful about (un)escaping.
    • Double quotes " in msgids and msgstrs have to be unescaped, but the slashes can't be stripped, because it may be that the double quote was also escaped in PHP
    • Single quotes have to be escaped when they're replaced into PHP, but then they also have to be changed in the .po-file. Luckily for me, single quotes only appear in English text.
  2. msgids and msgstrs can have multiple lines, then they look like this
    msgid = ""
    "line 1\n"
    "line 2\n"
    msgstr = ""
    "line 1\n"
    "line 2\n"
  3. plural forms are of course skipped at the moment, but in my case that's not an issue
  4. poedit wants to remove strings as obsolete that seem successfully switched and I have no idea why this happens in (many) cases.

I'll have to stop working on this for tonight. Still it seems using the parser instead of RegExps wouldn't be overkill.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

属性 2024-10-19 07:43:28

我以 akirk 的答案为基础,希望保留我在这里作为答案的内容,以防有人遇到同样的问题。
这不是递归的,但当然很容易改变。请随意评论改进,我将观看并编辑这篇文章。

$po = file_get_contents("locale/en_GB/LC_MESSAGES/messages.po");

$translations = array(); // german => english
$rawmsgids = array(); // find later
$msgidhits = array(); // record success
$msgstrs = array(); // find later

preg_match_all('/msgid "(.+)"\nmsgstr "(.+)"/', $po, $matches, PREG_SET_ORDER);

foreach ($matches as $match) {
    $german = str_replace('\"','"',$match[1]); // unescape double quotes (could misfire if you escaped double quotes in PHP _("<a href=\"bla\">bla</a>") but in my case that was one case versus many)
    $english = str_replace('\"','"',$match[2]);


    $en_sq_e = str_replace("'","\'",$english); // escape single quotes

    $translations['_(\''. $german . '\''] = '_(\'' . $en_sq_e . '\'';
    $rawmsgids['_(\''. $german . '\''] = $match[1]; // find raw msgid with searchstr as key

    $translations['_("'. $match[1] . '"'] = '_("' . $match[2] . '"';
    $rawmsgids['_("'. $match[1] . '"'] = $match[1];

    $translations['__(\''. $german . '\''] = '__(\'' . $en_sq_e . '\'';
    $rawmsgids['__(\''. $german . '\''] = $match[1];

    $translations['__("'. $match[1] . '"'] = '__("' . $match[2] . '"';
    $rawmsgids['__("'. $match[1] . '"'] = $match[1];

    $msgstrs[$match[1]] = $match[2]; // msgid => msgstr
}


foreach (glob("*.php") as $file) {
    $code = file_get_contents($file);

    $filehits = 0; // how many replacements per file

    foreach($translations AS $msgid => $msgstr) {
        $hits = 0;
        $code = str_replace($msgid,$msgstr,$code,$hits);
        $filehits += $hits;

        if($hits!=0) $msgidhits[$rawmsgids[$msgid]] = 1; // this serves to record if the msgid was found in at least one incarnation
        elseif(!isset($msgidhits[$rawmsgids[$msgid]])) $msgidhits[$rawmsgids[$msgid]] = 0;
    }
    // file_put_contents($file, $code); // be careful to test this first before doing the actual replace (and do use a version control system!) 
    echo "$file : $filehits <br>"; 
    echo $code;
}
/* debug */ 
$found = array_keys($msgidhits, 1, true);
foreach($found AS $mid) echo $mid . " => " . $msgstrs[$mid] . "\n\n";

echo "Not Found: <br>";
$notfound = array_keys($msgidhits, 0, true);
foreach($notfound AS $mid) echo $mid . " => " . $msgstrs[$mid] . "\n\n";

/*
following steps are still needed:
    * convert plurals (ngettext)
    * convert multi-line msgids and msgstrs (format mentioned in question)
    * resolve uniqueness conflict (msgids are unique, msgstrs are not), so you may have duplicate msgids (poedit finds these)
*/

I built on akirk's answer and wanted to preserve what I came up with as an answer here, in case somebody has the same problem.
This is not recursive, but that could easily change of course. Feel free to comment with improvements, I will be watching and editing this post.

$po = file_get_contents("locale/en_GB/LC_MESSAGES/messages.po");

$translations = array(); // german => english
$rawmsgids = array(); // find later
$msgidhits = array(); // record success
$msgstrs = array(); // find later

preg_match_all('/msgid "(.+)"\nmsgstr "(.+)"/', $po, $matches, PREG_SET_ORDER);

foreach ($matches as $match) {
    $german = str_replace('\"','"',$match[1]); // unescape double quotes (could misfire if you escaped double quotes in PHP _("<a href=\"bla\">bla</a>") but in my case that was one case versus many)
    $english = str_replace('\"','"',$match[2]);


    $en_sq_e = str_replace("'","\'",$english); // escape single quotes

    $translations['_(\''. $german . '\''] = '_(\'' . $en_sq_e . '\'';
    $rawmsgids['_(\''. $german . '\''] = $match[1]; // find raw msgid with searchstr as key

    $translations['_("'. $match[1] . '"'] = '_("' . $match[2] . '"';
    $rawmsgids['_("'. $match[1] . '"'] = $match[1];

    $translations['__(\''. $german . '\''] = '__(\'' . $en_sq_e . '\'';
    $rawmsgids['__(\''. $german . '\''] = $match[1];

    $translations['__("'. $match[1] . '"'] = '__("' . $match[2] . '"';
    $rawmsgids['__("'. $match[1] . '"'] = $match[1];

    $msgstrs[$match[1]] = $match[2]; // msgid => msgstr
}


foreach (glob("*.php") as $file) {
    $code = file_get_contents($file);

    $filehits = 0; // how many replacements per file

    foreach($translations AS $msgid => $msgstr) {
        $hits = 0;
        $code = str_replace($msgid,$msgstr,$code,$hits);
        $filehits += $hits;

        if($hits!=0) $msgidhits[$rawmsgids[$msgid]] = 1; // this serves to record if the msgid was found in at least one incarnation
        elseif(!isset($msgidhits[$rawmsgids[$msgid]])) $msgidhits[$rawmsgids[$msgid]] = 0;
    }
    // file_put_contents($file, $code); // be careful to test this first before doing the actual replace (and do use a version control system!) 
    echo "$file : $filehits <br>"; 
    echo $code;
}
/* debug */ 
$found = array_keys($msgidhits, 1, true);
foreach($found AS $mid) echo $mid . " => " . $msgstrs[$mid] . "\n\n";

echo "Not Found: <br>";
$notfound = array_keys($msgidhits, 0, true);
foreach($notfound AS $mid) echo $mid . " => " . $msgstrs[$mid] . "\n\n";

/*
following steps are still needed:
    * convert plurals (ngettext)
    * convert multi-line msgids and msgstrs (format mentioned in question)
    * resolve uniqueness conflict (msgids are unique, msgstrs are not), so you may have duplicate msgids (poedit finds these)
*/
等数载,海棠开 2024-10-19 07:43:28

请参阅http://code.activestate.com/recipes /475109-regular-expression-for-python-string-literals/ 用于查找字符串文字的良好基于 python 的正则表达式,并考虑转义。虽然它是 python,但这对于多行字符串和其他极端情况可能非常好。

请参阅http://docs.translatehouse.org/projects/ translate-toolkit/en/latest/commands/poswap.html 用于 .po 文件的现成的、开箱即用的基本语言交换器。

例如,以下命令行会将基于德语的西班牙语翻译转换为基于英语的西班牙语翻译。您只需确保新的基本语言(英语)在开始转换之前已 100% 翻译:

poswap -i de-en.po -t de-es.po -o en-es.po

最后要将英语 po 文件交换为德语 po 文件,请使用 swappo:
http://manpages.ubuntu.com/manpages/hardy/man1/swappo .1.html

交换文件后,可能需要对结果文件进行一些手动修改。例如,标题可能会被破坏,并且可能会出现一些重复的文本。

See http://code.activestate.com/recipes/475109-regular-expression-for-python-string-literals/ for a good python-based regular expression for finding string literals, taking escapes into account. Although it's python, this might be quite good for multiline strings and other corner cases.

See http://docs.translatehouse.org/projects/translate-toolkit/en/latest/commands/poswap.html for a ready, out-of-the-box base language swapper for .po files.

For instance, the following command line will convert german-based spanish translation to english-based spanish translation. You just have to ensure that your new base language (english) is 100% translated before starting conversion:

poswap -i de-en.po -t de-es.po -o en-es.po

And finally to swap english po file to german po file, use swappo:
http://manpages.ubuntu.com/manpages/hardy/man1/swappo.1.html

After swapping files, some manual polishing of resultant files might be required. For instance headers might be broken and some duplicate texts might occur.

扎心 2024-10-19 07:43:28

因此,如果我理解正确的话,您希望将所有德语 gettext 调用替换为英语调用。要替换目录中的内容,可以使用类似的方法。

$po = file_get_contents("translation.pot");
$translations = array(); // german => english
preg_match_all('/msgid "(.+)"\nmsgstr "(.+)"/', $po, $matches, PREG_SET_ORDER);
foreach ($matches as $match) {
    $translations['_("'. $match[1] . '")'] = '_("' . $match[2] . '")';
    $translations['_(\''. $match[1] . '\')'] = '_(\'' . $match[2] . '\')';
}
foreach (glob("*.php") as $file) {
    $code = file_get_contents($file);
    $code = str_replace(array_keys($translations), array_values($translations), $code);
    //file_put_contents($file, $code);
    echo $code; // be careful to test this first before doing the actual replace (and do use a version control system!)
}

So if I understand you correctly you'd like to replace all German gettext calls with English ones. To replace the contents in the directory, something like this could work.

$po = file_get_contents("translation.pot");
$translations = array(); // german => english
preg_match_all('/msgid "(.+)"\nmsgstr "(.+)"/', $po, $matches, PREG_SET_ORDER);
foreach ($matches as $match) {
    $translations['_("'. $match[1] . '")'] = '_("' . $match[2] . '")';
    $translations['_(\''. $match[1] . '\')'] = '_(\'' . $match[2] . '\')';
}
foreach (glob("*.php") as $file) {
    $code = file_get_contents($file);
    $code = str_replace(array_keys($translations), array_values($translations), $code);
    //file_put_contents($file, $code);
    echo $code; // be careful to test this first before doing the actual replace (and do use a version control system!)
}
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文