确定PHP中二进制数据的未知数据格式
我有混合了 uint32 和 null 终止字符串的二进制数据。我知道单个数据集的大小(每组数据共享相同的格式),但不知道实际的格式。
我一直在使用 unpack 使用以下函数读取数据:
function read_uint32( $fh ){
$return_value = fread($fh, 4 );
$return_value = unpack( 'L', $return_value );
return $return_value[1];
}
function read_string( $fh ){
do{
$char = fread( $fh, 1 );
$return_string .= $char;
}while( ord( $char ) != 0 );
return substr($return_string, 0, -1);
}
然后基本上尝试这两个函数并查看数据作为字符串是否有意义,如果不是,则可能是 int,是否有更简单的方法可以执行此操作?
谢谢。
I have binary data with a mix of uint32 and null terminated strings. I know the size of an individual data set ( each set of data shares the same format ), but not the actual format.
I've been using unpack to read the data with the following functions:
function read_uint32( $fh ){
$return_value = fread($fh, 4 );
$return_value = unpack( 'L', $return_value );
return $return_value[1];
}
function read_string( $fh ){
do{
$char = fread( $fh, 1 );
$return_string .= $char;
}while( ord( $char ) != 0 );
return substr($return_string, 0, -1);
}
and then basically trying both functions and seeing if the data makes sense as a string, and if not it's probably an int, is there an easier way to go about doing this?
Thanks.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
嗯,我认为你的方法没问题。
好吧,如果你只得到 ascii 字符串,那么它很容易,因为最高位总是 0 或 1(在一些奇怪的情况下......)分析文件中的一些字节,然后查看分布会告诉你可能是它的 ascii 还是其他东西二进制。
如果你有不同的编码,比如 utf8 或其他编码,那真的很痛苦。
您可能可以寻找重复出现的 CR/LF 字符或过滤掉 0-31 ,只让 tab、cr、lf、ff 滑过。当您分析前 X 个字节并比较非制表符、cr、lf、ff 字符和其他字符的比率时。这适用于任何编码,因为 ascii 范围是规范的......
要定义实际的文件类型,最好将其交给操作系统层,然后简单地从 shell 调用文件或使用 php 函数来获取 mimetype...
well i think your approcah is okay.
well if you get only ascii strings its quite easy as the hightest bit will always be 0 or 1 (in some strange cases...) analyzing some bytes from the file and then look at the distribution will tell you probably whether its ascii or something binary.
if you have a different encoding like utf8 or something its really a pain in the ass.
you could probablly look for recurring CR/LF chars or filter out the raing 0-31 to only let tab, cr, lf, ff slip trhough. when you analyze the first X bytes and compare the ratio of non tab,cr,lf,ff chars and others. this will work for any encoding as the ascii range is normed...
to define the actual filetype its probably best to let this to the os layer and simply call file from the shell or use the php functions to get the mimetype...