在 php 文件中搜索十六进制字符串?
我目前在班级中使用以下两种方法来完成工作:
function xseek($h,$pos){
rewind($h);
if($pos>0)
fread($h,$pos);
}
function find($str){
return $this->startingindex($this->name,$str);
}
function startingindex($a,$b){
$lim = 1 + filesize($a) - strlen($b)/2;
$h = fopen($a,"rb");
rewind($h);
for($i=0;$i<$lim;$i++){
$this->xseek($h,$i);
if($b==strtoupper(bin2hex(fread($h,strlen($b)/2)))){
fclose($h);
return $i;
}
}
fclose($h);
return -1;
}
我意识到这效率相当低,尤其是对于 PHP,但我的托管计划中不允许使用任何其他语言。
我运行了几个测试,当十六进制字符串接近文件开头时,它会快速运行并返回偏移量。但是,当找不到十六进制字符串时,页面会挂起一段时间。这让我心痛不已,因为上次我用 PHP 进行测试并出现了挂起的页面,我的网站主机由于 CPU 时间过多而关闭了我的网站 24 小时。
有没有更好的方法来完成此任务(在文件中查找十六进制字符串的偏移量)?是否有某些方面可以改进以加快执行速度?
我会将文件的全部内容读入一个十六进制字符串并使用 strrpos,但我收到有关超出最大内存的错误。如果我将文件切碎并使用 strrpos 搜索大块,这会是更好的方法吗?
编辑:
具体来说,我正在处理游戏的设置文件。设置及其值位于一个块中,其中设置之前有一个 32 位 int,然后是设置,值之前有一个 32 位 int,然后是值。两个整数都表示以下字符串的长度。例如,如果设置为“test”且值为“0”,则它看起来像(十六进制):00000004746573740000000130。既然你提到了它,这似乎是一个糟糕的方法。你会推荐什么?
编辑2:
我尝试了一个低于允许的最大内存的文件,并尝试了strrpos,但它比我尝试的方式慢得多。
编辑3:回复查尔斯:
未知的是设置块的长度及其开始位置。我所知道的是第一个和最后一个设置通常是什么。我一直在使用这些搜索方法来查找第一个和最后一个设置的位置并确定设置块的长度。我也知道父块从哪里开始。设置块通常不超过其父级的 50 个字节,因此我可以在那里开始搜索第一个设置并限制其搜索范围。问题是我还需要找到最后的设置。设置块的长度是可变的并且可以是任意长度。我可以按照我假设游戏的方式读取文件,通过读取设置的大小、读取设置、读取值的大小、读取值等,直到到达值为 -1 或 FF 的字节以十六进制表示。限制对第一个设置的搜索和正确读取设置的组合是否会提高效率?
I'm currently using the following two methods in my class to get the job done:
function xseek($h,$pos){
rewind($h);
if($pos>0)
fread($h,$pos);
}
function find($str){
return $this->startingindex($this->name,$str);
}
function startingindex($a,$b){
$lim = 1 + filesize($a) - strlen($b)/2;
$h = fopen($a,"rb");
rewind($h);
for($i=0;$i<$lim;$i++){
$this->xseek($h,$i);
if($b==strtoupper(bin2hex(fread($h,strlen($b)/2)))){
fclose($h);
return $i;
}
}
fclose($h);
return -1;
}
I realize this is quite inefficient, especially for PHP, but I'm not allowed any other language on my hosting plan.
I ran a couple tests, and when the hex string is towards the beginning of the file, it runs quickly and returns the offset. When the hex string isn't found, however, the page hangs for a while. This kills me inside because last time I tested with PHP and had hanging pages, my webhost shut my site down for 24 hours due to too much cpu time.
Is there a better way to accomplish this (finding a hex string's offset in a file)? Is there certain aspects of this that could be improved to speed up execution?
I would read the entire contents of the file into one hex string and use strrpos, but I was getting errors about maximum memory being exceeded. Would this be a better method if I chopped the file up and searched large pieces with strrpos?
edit:
To specify, I'm dealing with a settings file for a game. The settings and their values are in a block where there is a 32-bit int before the setting, then the setting, a 32-bit int before the value, and then the value. Both ints represent the lengths of the following strings. For example, if the setting was "test" and the value was "0", it would look like (in hex): 00000004746573740000000130. Now that you mention it, this does seem like a bad way to go about it. What would you recommend?
edit 2:
I tried a file that was below the maximum memory I'm allowed and tried strrpos, but it was very much slower than the way I've been trying.
edit 3: in reply to Charles:
What's unknown is the length of the settings block and where it starts. What I do know is what the first and last settings USUALLY are. I've been using these searching methods to find the location of the first and last setting and determine the length of the settings block. I also know where the parent block starts. The settings block is generally no more than 50 bytes into its parent, so I could start the search for the first setting there and limit how far it will search. The problem is that I also need to find the last setting. The length of the settings block is variable and could be any length. I could read the file the way I assume the game does, by reading the size of the setting, reading the setting, reading the size of the value, reading the value, etc. until I reached a byte with value -1, or FF in hex. Would a combination of limiting the search for the first setting and reading the settings properly make this much more efficient?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
你有很多垃圾代码。例如,这段代码几乎什么也没做:
因为它每次都从文件的开头读取。此外,如果你不归还某些东西,为什么你还需要阅读它呢?您可能正在寻找
fseek()
吗?如果您需要在二进制文件中查找十六进制字符串,可能最好使用如下内容: http://pastebin.com/ fpDBdsvV(告诉我是否有一些错误/问题)。
但是,如果您正在解析游戏的设置文件,我建议您使用
fseek ()
,fread()
和
unpack()
寻找设置所在的位置,读取部分字节并将其解压为 PHP 的变量类型。You have a lot of garbage code. For example, this code is doing nearly nothing:
because it reads everytime from the begining of the file. Furthemore, why do you need to read something if you are not returning it? May be you looke for
fseek()
?If you need to find a hex string in binary file, may be better to use something like this: http://pastebin.com/fpDBdsvV (tell me if there some bugs/problems).
But, if you are parsing game's settings file, I'd advise you to use
fseek()
,fread()
andunpack()
to seek to a place of where setting is, read portion of bytes and unpack it to PHP's variable types.