如何对二进制文件格式进行逆向工程以实现兼容性
我正在开发一种文件准备软件,使翻译人员能够轻松有效地处理各种文件格式。
就基于文本的格式(xml、php、资源文件...)而言,我的小型准备实用程序运行良好,但大多数翻译人员面临的一个主要问题是处理各种专有的二进制格式(Framemaker、Publisher、夸克……)。
这些文件很少被请求,并且需要在昂贵的应用程序中打开(很少有自由职业者能够买得起价值 20,000 美元的软件只是为了每年处理几个项目),而且即使这样,直接在这些应用程序中工作也不方便。
我希望能够读取这些文件并提取文本,以便可以将其翻译,然后以最小的努力(甚至更好)重新导入到原始应用程序中,以重新创建有效的本机二进制文件。
这听起来可行吗?
在哪里可以找到有关处理二进制文件格式的更多信息,以及是否有针对此类作业的有用工具(除了常规的十六进制编辑器之外)?
提前致谢。
I am working of a file preparation software to enable translators work easily and efficiently on a wide range of file formats.
As far as text-based formats (xml, php, resource files,...) are concerned, my small preparation utility works fine, but a major problem for most translators is to handle all kinds of proprietary binary formats (Framemaker, Publisher, Quark...).
These files are rarely requested and need to be opened in expensive applications (few freelance can afford to buy $20,000 worth of software just to handle a few projects per year), and even then it is not convenient to work directly in those applications anyway.
I would like to be able to read these files and extract the text in such a way that it can be translated and then re-imported in the original application with minimal effort, or even better, to recreate a valid native binary file.
Does that sound doable?
Where can I find more information on handling binary file formats and are there useful tools for these kind of jobs (besides regular hex editors)?
Thanks in advance.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
当然,逆向工程是可能的,但如果没有格式规范,则需要大量工作。 我会关注支持这些“很少要求、非常昂贵”格式的努力回报。 您最好花精力改进应用程序的核心功能。
另一个角度是联系具有这些格式的公司,解释您的目标,解释这对他们的产品有帮助,如果他们不将您视为竞争对手,他们可能愿意提供帮助。
Of course reverse engineering is possible, but without format specs it will take a lot of work. I would look at the return on effort regarding supporting these 'rarely requested, very expensive' formats. You may be better off spending that effort improving the core functionality of your app.
Another angle is to contact the companies with these formats, explain your goal, explain that it helps their product, and if they don't see you as competition they might be willing to help.
我知道你想对它们进行逆向工程 - 但由于这些可能是专有文件格式,你正在寻找一条非常陡峭的曲线试图解码它们......
有些(因为我之前已经编写了一些供内部使用的专有格式)有特定的方法以及写入其中的对象,这些对象服务于文件内容本身之外的某些替代过程。 可以证明新文件非法的东西。
只是我的 2 美分,我不是律师 =>
I know that you want to reverse engineer them - but since these may be propriety file formats you are looking at a very steep curve trying to decode them...
Some (as I have written some propritety formats for interal use before) have specific methods and objects written into them that serve some alternative process than the file contents themselves. Stuff that would prove the new file is illegal.
Just my 2 cents and I am no lawyer =>
也许您可以选择一个更便宜且具有 QuarkXPress 导入功能的应用程序。 例如,InDesign 应该能够读取 Quark 文档。 然后使用导入应用程序导出为您需要的任何格式 - 可能需要插件的帮助。
Maybe you could pick a cheaper application which has import features for QuarkXPress. For example InDesign should be able to read Quark documents. Then use the importing application to export to whatever format you need - maybe with a help of plug-in.