图书翻译数据格式
我正在考虑将一本书从英语翻译成我的母语。我可以很好地翻译,并且我对 vim
作为文本编辑器感到满意。我的问题是我想以某种方式保留语义,即我的翻译的哪些部分与原文相对应。
我基本上可以创建一种简单的基于 XML 的标记语言,看起来像
<book>
<chapter>
<paragraph>
<sentence>
<original>This is an example sentence.</original>
<translation lang="fi">Tämä on esimerkkilause.</translation>
</sentence>
</paragraph>
</chapter>
</book>
现在,这可能会有它的好处,但我不认为编辑会很有趣。
我能想到的另一种可能性是将原文和翻译保存在单独的文件中。如果我在每个翻译块后添加换行符并保持行编号一致,则编辑将很容易,并且我能够以编程方式匹配原始内容和翻译内容。
original.txt:
This is an example sentence.
In this format editing is easy.
translation-fi.txt:
Tämä on esimerkkilause.
Tässä muodossa muokkaaminen on helppoa.
然而,这似乎不是很稳健。那就很容易搞砸了。可能有人有更好的想法。因此问题是:
使用文本编辑器进行书籍翻译的最佳数据格式是什么?
编辑:添加标签 vim
,因为我更喜欢使用vim 并相信一些 vim 大师可能有想法。
EDIT2:对此开始悬赏。我目前倾向于我描述的第二个想法,但我希望得到一些易于编辑(并且很容易实现)但更强大的东西。
I'm thinking of translating a book from English to my native language. I can translate just fine, and I'm happy with vim
as a text editor. My problem is that I'd like to somehow preserve the semantics, i.e. which parts of my translation correspond to the original.
I could basically create a simple XML-based markup language, that'd look something like
<book>
<chapter>
<paragraph>
<sentence>
<original>This is an example sentence.</original>
<translation lang="fi">Tämä on esimerkkilause.</translation>
</sentence>
</paragraph>
</chapter>
</book>
Now, that would probably have its benefits but I don't think editing that would be very fun.
Another possibility that I can think of would be to keep the original and translation in separate files. If I add a newline after each translation chunk and keep line numbering consistent, editing would be easy and I'd be able to programmatically match the original and translation.
original.txt:
This is an example sentence.
In this format editing is easy.
translation-fi.txt:
Tämä on esimerkkilause.
Tässä muodossa muokkaaminen on helppoa.
However, this doesn't seem very robust. It would be easy to mess up. Probably someone has better ideas. Thus the question:
What would be the best data format for making a book translation with a text editor?
EDIT: added tag vim
, since I'd prefer to do this with vim and believe that some vim guru might have ideas.
EDIT2: started a bounty on this. I'm currently leaning to the second idea I describe, but I hope to get something about as easy to edit (and quite easy to implement) but more robust.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
一个想法:如果你将每个可翻译块(一个或多个句子)保留在自己的行中,vim 的选项
scrollbind
、cursorbind
和一个简单的垂直分割将帮助你保留块“同步”。它看起来非常类似于 vimdiff 默认情况下的操作。然后,文件应该具有相同数量的行,您甚至不需要切换窗口!但是,这并不是很完美,因为包裹线往往会有点混乱。如果您的翻译比原始文本多包含两到三行虚拟行,则视觉相关性会消失,因为这些行不再是一对一的。我找不到解决方案或脚本来修复该行为。
我提出的其他建议是将翻译与原文交错。这接近 Benoit 建议的 diff 方法。在将原始内容分割成块(每行一个块)后,我会在每一行前面添加一个
>> 或类似的内容。一个块的翻译将以
o
开始。该文件将如下所示:我将通过执行
:match Comment /^>>.*$/
或类似的操作来增强可读性,只要与您的配色方案看起来不错即可。也许编写一个禁用原始文本拼写检查的:syn
区域是值得的。最后,作为一个细节,我将
绑定到2j
并将
绑定到2k
code> 允许在重要部分之间轻松跳转。后一种方法的优点还包括,如果您像我一样,可以将内容包装在 80 列中:) 编写
在翻译之间跳转仍然是微不足道的。缺点:缓冲区完成会受到影响,因为现在它会完成原始单词和翻译后的单词。英语单词不会经常出现在翻译中! :) 但这已经足够强大了。完成后,一个简单的
grep
将剥离原始文本。One thought: if you keep each translatable chunk (one or more sentences) in its own line, vim's option
scrollbind
,cursorbind
and a simple vertical split would help you keeping the chunks "synchronized". It looks very much like to what vimdiff does by default. The files should then have the same amount of lines and you don't even need to switch windows!But, this isn't quite perfect because wrapped lines tend to mess up a little bit. If your translation wraps over two or three more virtual lines than the original text, the visual correlation fades as the lines aren't one-on-one anymore. I couldn't find a solution or a script for fixing that behavior.
Other suggestion I would propose is to interlace the translation into the original. This approaches the diff method of Benoit's suggestion. After the original is split up into chunks (one chunk per line), I would prepend a
>>
or similar on every line. A translation of one chunk would begin byo
. The file would look like this:And I would enhance the readability by doing a
:match Comment /^>>.*$/
or similar, whatever looks nice with your colorscheme. Probably it would be worthwhile to write a:syn
region that disables spell checking for the original text. Finally, as a detail, I'd bind<C-j>
to do2j
and<C-k>
to2k
to allow easy jumping between the parts that matter.Pros for this latter approach also include that you could wrap things in 80 columns if you feel like I do :) It would still be trivial to write
<C-j/k>
to jump between translations.Cons: buffer-completion suffers as now it completes both original and translated words. English words don't hopefully occur in the translations that often! :) But this is as robust as it gets. A simple
grep
will peel the original text off after you are done.为什么不使用简化的 diff 格式?
@@
部分Why not use a simplified diff format?
@@
parts假设您想保持原文和翻译文本之间的 1 - 1 关系,那么数据库表最有意义。
您将有一个包含以下列的表:
您需要一个过程来加载原始文本,以及一个过程来显示原始文本的一行文本并允许您键入翻译的文本。也许第二个过程可以向您显示 5 行(前面 2 行,您要翻译的行,后面 2 行)来为您提供上下文。
Assuming you want to keep the 1 - 1 relationship between the original text and the translated text, a database table makes the most sense.
You'd have one table with the following columns:
You'd need a process to load the original text, and a process to show you one line of the original text and allow you to type the translated text. Perhaps the second process could show you 5 lines (2 before, the line you want to translate, and 2 after) to give you context.