该二进制数据文件的格式/编码
我正在尝试编写一个与 Advent Axys、财务规划师软件等集成的程序。 该产品的网站位于:http://www.advent.com/ Solutions/asset-managers-software/axys-platform
我需要将新条目写入价格文件,但其中大部分都是二进制的。 我在网上查了一下,没有找到太多信息,我给他们的支持人员发了电子邮件,但我怀疑这会有帮助。
我有一个简短的虚拟文件和程序提供给该文件的打印输出。 我通过 ruby 脚本运行该文件,如果该字符是单词字符或符号,则打印该字符,否则打印 ASCII 值。 这是 Ruby 脚本:
pri = File.read '062109_dummy.pri'
pri.each_byte do |char|
print char.chr =~ /[\w!@#\$%\^&\*\(\)\-\\\/\+\.]/ ? char.chr : ' ' + char.to_s + ' '
end
和输出:
pri1.001 254 250 251 252 29 0 0 2 adusnok 0 0 0 0 0 0 0 0 0 33333s7@ 1 254 250 251 252 29 0 0 2 csusxom 0 0 0 0 0 0 0 0 0 H 225 z 20 174 GA@ 1 254 250 251 252 29 0 0 2 etusvv 0 0 0 0 0 0 0 0 0 0 246 (\ 143 194 213 F@ 1 254 250 251 252 29 0 0 2 fdusoakbx 0 0 0 0 0 0 0 174 G 225 z 20 174 (@ 1 254 250 251 252 29 0 0 2 oousfidde09 0 0 0 0 0 154 153 153 153 153 185 S@ 1 254 250251 252 29 0 0 2 qpusfid_eqix 0 0 0 0 164 p 61 10 215 cL@ 1 254 250 251 252 29 0 0 2 vausvg_sc 0 0 0 0 0 0 0 )\ 143 194 245 248 P@ 1
请注意,如果数字周围有空格,则意味着它是字节的值,如果没有,则字节的值是该数字的 ASCII 表示形式。
我知道字母串(如“adusnok”)代表股票等。 然后还有 0-ed 位,因为符号的空间是固定大小的(这就是为什么较长的符号后面的 0 较少)。 序列 @ 1 254 250 251 252 29 0 0 2
似乎表示记录的结束,位于新记录符号之前。 或者,其中一些可能表示所有这些都相同的东西,但似乎没有太多相同。 之后我就基本上什么都不知道了。 我确实有程序认为映射到的内容的打印输出。 每列之间用 3 个空格分隔,它是:
adus nok 23.45 NOKIA CORP ADR 0.393 05/30/2008
csus xom 34.56 EXXON MOBIL CORPORATION COM 1.68 06/10/2009
etus vv 45.67 VANGUARD LRG CAP ETF US PRIME MKT 750 1.04 3/31/2009
还有更多,但这应该会给您一个很好的主意。 我认为描述和其他可能的东西很可能存储在其他文件中并且只是查找。 但我知道价格在该文件中,因为这些是价格文件,这就是重点。 所以:
33333s7 => 23.45 H225 z 20 174 GA => 34.56 246 (\ 143 194 213 F => 45.67
请注意,将 3 和 7 保存在第一个中,所有数字都是字节值,而不是值的 ASCII 表示形式。另请注意,这些值可以表示不仅仅是价格,但它们肯定代表了价格。
我不熟悉常见的二进制编码,但如果他们使用相当常见的方法,我不会感到惊讶。
I'm attempting to write a program that integrates with Advent Axys, software for financial planners and the like. The product's site is here: http://www.advent.com/solutions/asset-managers-software/axys-platform
I need to write new entries into the price files, but much of them are binary. I looked around online and didn't find much, and I emailed their support, but I doubt it will help.
I have a short dummy file and the printout that the program gives to said file. I ran the file through a ruby script that prints the character if it is a word character or symbol and the ASCII val otherwise. Here's the Ruby script:
pri = File.read '062109_dummy.pri'
pri.each_byte do |char|
print char.chr =~ /[\w!@#\$%\^&\*\(\)\-\\\/\+\.]/ ? char.chr : ' ' + char.to_s + ' '
end
And output:
pri1.001 254 250 251 252 29 0 0 2 adusnok 0 0 0 0 0 0 0 0 0 33333s7@ 1 254 250 251 252 29 0 0 2 csusxom 0 0 0 0 0 0 0 0 0 H 225 z 20 174 GA@ 1 254 250 251 252 29 0 0 2 etusvv 0 0 0 0 0 0 0 0 0 0 246 (\ 143 194 213 F@ 1 254 250 251 252 29 0 0 2 fdusoakbx 0 0 0 0 0 0 0 174 G 225 z 20 174 (@ 1 254 250 251 252 29 0 0 2 oousfidde09 0 0 0 0 0 154 153 153 153 153 185 S@ 1 254 250251 252 29 0 0 2 qpusfid_eqix 0 0 0 0 164 p 61 10 215 cL@ 1 254 250 251 252 29 0 0 2 vausvg_sc 0 0 0 0 0 0 0 )\ 143 194 245 248 P@ 1
Note that if a number has spaces around it, that means it's the value of the byte, and if it doesn't, then the value of the byte was the ASCII representation of that number.
I know that the strings of letters (like "adusnok") are the representations of the stocks and the like. Then there are 0-ed bits because the space for the symbols are fixed-size (which is why there are fewer 0's after a longer symbol). The sequence @ 1 254 250 251 252 29 0 0 2
seems to signify the end of a record, coming right before the symbol for a new one. Alternatively, some of it could signify something that is the same for all of these, but not much seems the same. After that, I know basically nothing. I do have the printout of what the program thinks that maps to. With 3 spaces separating each column, it is:
adus nok 23.45 NOKIA CORP ADR 0.393 05/30/2008
csus xom 34.56 EXXON MOBIL CORPORATION COM 1.68 06/10/2009
etus vv 45.67 VANGUARD LRG CAP ETF US PRIME MKT 750 1.04 3/31/2009
There's more, but that should give you a pretty good idea. I think it's quite possible that the Descriptions, and possible other things, are stored in other files and just looked up. But I know that the prices are in that file, because these are price files and that's the whole point. So:
33333s7 => 23.45
H225 z 20 174 GA => 34.56
246 (\ 143 194 213 F => 45.67
Note that save the 3's and 7's in the first one, all of the numbers there are values of the bytes, not the ASCII representations of the values. Also note that those values could represent a little more than just the price, but they definitely represent the price.
Any ideas? I'm not familiar with common binary encodings, but I wouldn't be surprised if they used a fairly common method.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
如果您要发布逆向工程编解码器,则对二进制格式进行逆向工程是危险的。 他们可能会更改文件格式而不发出警告。 但是,如果您有决心并决心这样做:
您可以做的一件事是查看 IEEE 浮点数的格式:
http://steve.hollasch.net/cgindex/coding/ieeefloat.html
然后,从文件中的第一个字节开始,读取 4 或 8 字节的数据。 将两个集合(4 字节和 8 字节)转换为 float 和 double 值。 检查它们是否与您知道的文件中的值匹配。 如果是这样,您可能已经找到了价格的抵消。 打印出来,加上偏移量。 如果没有,请将您的查找增加一个字节,然后重试。
如果您可以通过这种方式找到所有值,那么您可以通过执行类似的操作在运行时安全地修补二进制文件:查找您知道的价格,然后在正确的位置修改价格值。
这根本不是万无一失的,因为随机的数据序列有时会匹配。 如果您注意到偏移之间有一定的距离,或者某些始终存在的印记,或者甚至更好,如果您可以在文件中找到这些偏移值,那么您可能会得到一些相当稳定的东西。
Reverse engineering a binary format is dangerous if you are going to ship your reverse engineered codec. They may change the file format w/o warning. However, if you are bound and determined to do it:
One thing you could do is to look at the format for IEEE floating point numbers:
http://steve.hollasch.net/cgindex/coding/ieeefloat.html
And then, starting at the first byte in the file, read 4 or 8 bytes of data. Convert both sets (4 bytes and 8 bytes) to float and double values. Check to see if they match the values that you know are in the file. If so, you have probably found the offset of a price. Print it out, plus the offset. If not, increment your seek by one byte and try again.
If you can find all the values that way, then you might be able to safely patch the binary files at runtime by performing a similar operation: looking for the prices that you know are there, and then modifying the price values in the right place.
This isn't foolproof at all, because random sequences of data will sometimes match up. If you notice a definite distance between offsets, or some sigil that is always present, or perhaps even better, if you can find those offset values back in the file, you may have something modestly stable.