使用 gettext 翻译较长的文本(查看和电子邮件模板)

发布于 2024-12-14 05:37:21 字数 649 浏览 4 评论 0原文

我正在开发一个多语言 PHP Web 应用程序,并且我需要使用 gettext 翻译很长的文本。这些是电子邮件模板(通常很短,但仍然有几行)和视图模板的一部分(较长的描述性文本块)。这些文本将包括一些简单的 HTML(例如用于强调的粗体/斜体,可能是此处或此处的链接)。模板是捕获其输出的 PHP 脚本。

问题是 gettext 对于处理较长的文本似乎非常笨拙。随着时间的推移,较长的文本通常会比短文本有更多的变化——我可以更改 msgid 并确保在所有翻译中更新它(当 msgid 很长时,可能会需要大量工作并且很容易出错),或者我可以保留msgid 不变,仅修改翻译(这会在模板中留下误导性的过时文本)。另外,我见过反对在 gettext 字符串中包含 HTML 的建议,但避免它会将单个自然文本片段分成很多块,这对于翻译和重新组装来说将是一个更大的噩梦,而且我也见过反对的建议将 gettext 字符串不必要地拆分为单独的 msgid。

我看到的另一种方法是完全忽略这些较长文本的 gettext,并在每个语言环境的外部子模板中分隔这些块,并且只包含当前语言环境的块。缺点是我将 gettext .po 文件和位于完全不同位置的单独模板之间的翻译工作分开。

由于该应用程序将用作将来其他应用程序的起点,因此我正在尝试找出长期的最佳方法。我需要一些有关此类场景中最佳实践的建议。类似案例你们是如何实施的?什么是有效的,什么是坏主意?

I'm developing a multilingual PHP web application, and I've got long(-ish) texts that I need to translate with gettext. These are email templates (usually short, but still several lines) and parts of view templates (longer descriptive blocks of text). These texts would include some simple HTML (things like bold/italic for emphasis, probably a link here or there). The templates are PHP scripts whose output is captured.

The problem is that gettext seems very clumsy for handling longer texts. Longer texts would generally have more changes over time than short texts — I can either change the msgid and make sure to update it in all translations (could be lots of work and very error-prone when the msgid is long), or I can keep the msgid unchanged and modify only the translations (which would leave misleading outdated texts in the templates). Also, I've seen advice against including HTML in gettext strings, but avoiding it would break a single natural piece of text into lots of chunks, which will be an even bigger nightmare to translate and reassemble, and I've also seen advice against unnecessary splitting of gettext strings into separate msgids.

The other approach I see is to ignore gettext altogether for these longer texts, and to separate those blocks in external subtemplates for each locale, and just include the one for the current locale. The disadvantage is that I'm separating the translation effort between gettext .po files and separate templates located in a completely different location.

Since this application will be used as a starting point for other applications in the future, I'm trying to come up with the best approach for the long term. I need some advice for best practices in such scenarios. How have you implemented similar cases? What turned out to work and what turned out a bad idea?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

玩心态 2024-12-21 05:37:21

这是我在一个流量非常大的网站上使用的工作流程,该网站有大约几十个较长的样式化文本内容块,翻译成六种语言:

  1. 选择一种基于文本的标记语言(我们使用 Markdown)
  2. 对于长字符串,请使用固定消息 ID,例如“About_page_intro_markdown”:
    • 描述文本的意图
    • 明确表示它将以 markdown 格式解释
  3. 让我们的应用程序适当地渲染“*_markdown”字符串,确保只允许一些安全的 HTML 标签
  4. 为翻译人员构建一个工具:
    • 向他们展示实时渲染的 Markdown(有点像 Markdown dingus
    • 让他们可以轻松查看文本的当前权威基本语言翻译(因为 msgid 中不再包含该翻译)
  5. 教翻译人员如何使用新的工作流程

此工作流程的优点:

  • 消息 ID不要一直更改
  • 因为翻译人员正在使用安全的高级语法进行编辑,因此很难弄乱 HTML
  • 非技术翻译人员发现用 Markdown 编写非常容易,而不是 HTML

此工作流程的缺点:

  • 具有静态不变的消息 ID意味着文本中的更改需要在带外传输(无论如何我们都会这样做,因为长文本可能会引起有关语气或强调的问题)

我对这个工作流程在我们网站上的运行方式非常满意,并且绝对会推荐并再次使用它。上手花了几天时间,但构建、训练和启动都很容易。

希望这对您有所帮助,祝您的项目顺利。

Here's the workflow I used, on a very heavily-trafficked site that had about several dozen long-ish blocks of styled textual content, translated into six languages:

  1. Pick a text-based markup language (we used Markdown)
  2. For long strings, use fixed message IDs like "About_page_intro_markdown" that:
    • describes the intent of the text
    • makes clear that it will be interpreted in markdown format
  3. Have our app render "*_markdown" strings appropriately, making sure to allow only a few safe HTML tags
  4. Build a tool for translators that:
    • shows them their Markdown rendered in realtime (sort of like the Markdown dingus)
    • makes it easy for them to see the now-authoritative base language translation of the text (since that's no longer in the msgid)
  5. Teach translators how to use the new workflow

Pros of this workflow:

  • Message IDs don't change all the time
  • Because translators are editing in a safe higher-level syntax, hard to mess up HTML
  • Non-technical translators found it very easy to write in Markdown, vs. HTML

Cons of this workflow:

  • Having static unchanging message IDs means changes in the text need to be transmitted out of band (which we'd do anyway, as long text can raise questions about tone or emphasis)

I'm very happy with the way this workflow operated for our website, and would absolutely recommend it, and use it again. It took a couple of days to get started, but it was easy to build, train, and launch.

Hope this helps, and good luck with your project.

平安喜乐 2024-12-21 05:37:21

我刚刚遇到了这个特殊问题,我相信我以一种优雅的方式解决了它。

问题:我们想在 PHP 中使用 Gettext,并使用主要语言字符串作为键翻译。然而,对于大块 HTML(带有 h1、h2、p、a 等),我要么必须:

  • 为每个带有内容的标签创建一个翻译。

或者

  • 将带有标签的整个块放入一个翻译中。

这些选项都不吸引我,所以这就是我所做的:

  • 将简单的字符串(“OK”、“Add”、“Confirm”、“My Awesome App”)保留为常规 Gettext .po 条目,并将原始文本作为key
  • 以 Markdown 形式写入内容(大文本块),并将其保存在文件中。
    示例文件为 /homepage/content.md (主要/源文本)、/homepage/content.da-DK.md/homepage/content。 de-DE.md

  • 编写一个类来获取内容文件(针对当前语言环境)并对其进行解析。然后我像这样使用它:

但是,动态大文本怎么样?简单的。使用模板引擎。我决定使用 Smarty,并在我的 Template 类中使用它。

我现在可以使用模板逻辑.. 在 markdown 中! 那有多棒?!

接下来是棘手的部分。

为了让内容看起来不错,有时您需要以不同的方式构建 HTML。考虑一个活动区域,其下方有 3 个“功能框”。简单的解决方案:为活动区域准备一个文件,并为 3 个框中的每个框创建一个文件。

但我可以做得更好。

我编写了一个快速块解析器,因此我将所有内容写入一个文件中,然后单独渲染每个块。

示例文件:

[block campaign]
Buy this now!
=============

Blaaaah... And a smarty tag: {$cool}
[/block]

[block feature 1]
Feature 1
---------

asdasd you get it..
[/block]

[block feature 2] ...

这就是我在标记中呈现它们的方式:

<?php 
// At the top of the document...

// Class handles locale. :)
$template = Template::getContent("homepage/content", [
    "cool" => "Smarty variable! AWESOME!"
]);
?>

...

<title><?=_("My Awesome App")?></title>    

...

<div class="hero">
   <!-- Template data already processed! :) -->
   <?=$template->renderBlock("campaign")?>
</div>
<div class="featurebox">
   <?=$template->renderBlock("feature 1")?>
</div>
<div class="featurebox">
   <?=$template->renderBlock("feature 2")?>
</div>

恐怕我无法提供任何源代码,因为这是一个公司项目,但我希望您明白这一点。

I just had this particular problem, and I believe I solved it in an elegant way.

The problem: We wanted to use Gettext in PHP, and use primary language strings as keys translations. However, for large blocks of HTML (with h1, h2, p, a, etc...) I'd either have to:

  • Create a translation for each tag with content.

or

  • Put the entire block with tags in one translation.

Neither of those options appealed to me, so this is what I did:

  • Keep simple strings ("OK","Add","Confirm","My Awesome App") as regular Gettext .po entries, with the original text as the key
  • Write content (large text blocks) in markdown, and keep them in files.
    Example files would be /homepage/content.md (primary / source text), /homepage/content.da-DK.md, /homepage/content.de-DE.md

  • Write a class that fetches the content files (for the current locale) and parses it. I then used it like:

    <?=Template::getContent("homepage/content")?>

However, what about dynamic large text? Simple. Use a templating engine. I decided on Smarty, and used it in my Template class.

I could now use templating logic.. within markdown! How awesome is that?!

Then came the tricky part..

For content to look good, at times you need to structure your HTML differently. Consider a campaign area with 3 "feature boxes" beneath it. The easy solution: Have a file for the campaign area, and one for each of the 3 boxes.

But I could do better than that.

I wrote a quick block parser, so I would write all the content in one file, and then render each block seperately.

Example file:

[block campaign]
Buy this now!
=============

Blaaaah... And a smarty tag: {$cool}
[/block]

[block feature 1]
Feature 1
---------

asdasd you get it..
[/block]

[block feature 2] ...

And this is how I would render them in the markup:

<?php 
// At the top of the document...

// Class handles locale. :)
$template = Template::getContent("homepage/content", [
    "cool" => "Smarty variable! AWESOME!"
]);
?>

...

<title><?=_("My Awesome App")?></title>    

...

<div class="hero">
   <!-- Template data already processed! :) -->
   <?=$template->renderBlock("campaign")?>
</div>
<div class="featurebox">
   <?=$template->renderBlock("feature 1")?>
</div>
<div class="featurebox">
   <?=$template->renderBlock("feature 2")?>
</div>

I'm afraid I can't provide any source code, as this was for a company project, but I hope you get the idea.

混浊又暗下来 2024-12-21 05:37:21

gettext 并不是真正为翻译大段文本而设计的。

fwiw 我在 gettext 字符串中包含了基本的 HTML(strong、a 等),因为我相信我们的翻译人员知道他们在做什么(大部分是正确的),并且翻译将得到很好的测试。

我尝试过将文本分解为每段一个字符串的方法。大致上来说,如果文本中间有一段英文,看起来会很奇怪。如果其中一个字符串发生了变化,这意味着我们必须等待翻译才能发布新版本,这减慢了我们的速度。从好的方面来说,译者可以很容易地看到文本的哪一部分发生了变化。这种方法对于我尝试过的一个应用程序效果很好。

将一些文本拆分到外部位置也可行,但它会导致管理开销,而不仅仅是一两个 .po 文件,还有一大堆其他文本必须手动与英文版本进行比较并相应更新。如果您记得向翻译人员提供注释,解释英文版本中的差异所在和内容,那么这是可行的。

我自己仍然不相信这两种方法。

gettext wasn't really designed for translating large pieces of text.

fwiw I've included basic HTML (strong, a, etc) in gettext strings as I was confident our translators knew what they were doing (mostly right) and that the translations would be well tested.

I've tried the approach of breaking up the text into one string per paragraph. Roughly as it looks odd if there's one paragraph of English in the middle of the text. Where one of those strings have changed this has meant that we have had to wait for translations before releasing a new version, which has slowed us down. On the plus side it's easy for translators to see which part of the text has changed. This approach worked well for the one application I've tried it with.

Splitting some text out into external locations also worked, but it caused management overhead, rather than just a .po file or two, there was a whole bunch of other text that had to be manually compared to the English version and updated accordingly. This is doable if you remember to provide notes to your translators explaining where and what the difference was in the English version.

I'm still not sold on either approach myself.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文