使用 Java 创建 gettext 二进制 MO 文件
我尝试创建一个实用程序来解析 gettext po 文件并生成二进制 mo 文件。解析器很简单(我的公司不使用模糊、复数等,只是 msgid/msgstr),但生成器不起作用。
这是mo文件的描述,这里是原始生成器源(它是C),还找到了一个php脚本(https://github.com/josscrowcroft/php.mo/blob/master/php-mo.php) 。
我的代码:
public void writeFile(String filename, Map<String, String> polines) throws FileNotFoundException, IOException {
DataOutputStream os = new DataOutputStream(new FileOutputStream(filename));
HashMap<String, String> bvc = new HashMap<String, String>();
TreeMap<String, String> hash = new TreeMap(bvc);
hash.putAll(polines);
StringBuilder ids = new StringBuilder();
StringBuilder strings = new StringBuilder();
ArrayList<ArrayList> offsets = new ArrayList<ArrayList>();
ArrayList<Integer> key_offsets = new ArrayList<Integer>();
ArrayList<Integer> value_offsets = new ArrayList<Integer>();
ArrayList<Integer> temp_offsets = new ArrayList<Integer>();
for (Map.Entry<String, String> entry : hash.entrySet()) {
String id = entry.getKey();
String str = entry.getValue();
ArrayList<Integer> offsetsItems = new ArrayList<Integer>();
offsetsItems.add(ids.length());
offsetsItems.add(id.length());
offsetsItems.add(strings.length());
offsetsItems.add(str.length());
offsets.add((ArrayList) offsetsItems.clone());
ids.append(id).append('\0');
strings.append(str).append('\0');
}
Integer key_start = 7 * 4 + hash.size() * 4 * 4;
Integer value_start = key_start + ids.length();
Iterator e = offsets.iterator();
while (e.hasNext()) {
ArrayList<Integer> offEl = (ArrayList<Integer>) e.next();
key_offsets.add(offEl.get(1));
key_offsets.add(offEl.get(0) + key_start);
value_offsets.add(offEl.get(3));
value_offsets.add(offEl.get(2) + value_start);
}
temp_offsets.addAll(key_offsets);
temp_offsets.addAll(value_offsets);
os.writeByte(0xde);
os.writeByte(0x12);
os.writeByte(0x04);
os.writeByte(0x95);
os.writeByte(0x00);
os.writeInt(hash.size() & 0xff);
os.writeInt((7 * 4) & 0xff);
os.writeInt((7 * 4 + hash.size() * 8) & 0xff);
os.writeInt(0x00000000);
os.writeInt(key_start & 0xff);
Iterator offi = temp_offsets.iterator();
while (offi.hasNext()) {
Integer off = (Integer) offi.next();
os.writeInt(off & 0xff);
}
os.writeUTF(ids.toString());
os.writeUTF(strings.toString());
os.close();
}
os.writeInt(key_start); 行似乎没问题,与原始工具生成的 mo 文件的差异是在这些字节之后开始的。
怎么了? (除了我可怕的英语..)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
在将您的实现与文档进行比较时,我注意到两件事:
紧随幻数之后的修订应该是一个 int。这似乎有效,可能是因为writeByte
输出一些填充。然而,使用 writeInt 会更清晰。writeInt
调用中的 0xFF 部分可能是错误的。需要此操作将有符号字节转换为其无符号整数值,对于正整数则不需要此操作。要解析 po 文件,您还可以查看 github 上的 zanata/tennera 项目。
编辑: writeUTF 调用也是有问题的,因为它使用两个字节长度作为输出前缀,并使用 javas 修改的 utf 编码来破坏 '\0' 字节。您可以将其替换为:
另一个编辑: 我无法让这段代码消失,关于 chars 与 utf8 字节中的字符串长度以及用 DataOutputStream 还存在其他问题href="http://en.wikipedia.org/wiki/Endianness" rel="nofollow">大端而非小端。我认为以下代码应该可以工作,不同之处在于 msgfmt 生成的文件包含一个可选的哈希表以加快访问速度:
When comparing your implementation with the documentation I noticed two things:
The revision, directly after the magic number, should be an int.This seems to work, probably becausewriteByte
outputs some padding. UsingwriteInt
would be clearer however.& 0xFF
part in thewriteInt
calls is probably wrong. This operation is needed to convert a signed byte to its unsigned integer value, for positive integers it should not be needed.For parsing of the po files you could also have a look at the zanata/tennera project on github.
Edit: The writeUTF call is also problematic since it prefixes the output with a two-byte length and mangles '\0' bytes using javas modified utf encoding. You could replace it by:
Another Edit: I could not let got of this code, there were further problems concerning string length in chars vs utf8 bytes and
DataOutputStream
writing in big-endian instead of little endian. I think the following code should work, the difference is that the file produced by msgfmt contains an optional hashtable to speed up access: