如何有效地找出文件与另一个早期版本的中断位置?
我有一个不断添加的文件(一个超出我控制的过程),并且我每 x 秒捕获该文件。我想提取文件的新内容(在我之前的捕获之间添加)并使用它。不幸的是,该文件没有任何内容表明它上次添加的时间,并且我无法写入该文件,所以我唯一的选择是存储我已经知道的文件中的内容并将其与我拥有的新版本进行比较。
现在我需要知道的是如何我能最好地做到这一点。我正在使用 PHP,我认为最简单的解决方案是只存储前面的内容,然后使用 explode()
计算出后面的内容,这(很明显)是一个糟糕的解决方案文件数量很大(1GB+),处理起来会很困难。
我的一个想法是存储最终字符的位置,然后从那里开始工作,例如,如果最后一个字符是第 100 个字符,那么我会在下一个过程中从第 100 个字符开始工作,但我不确定如何我可以做到这一点,或者如果 PHP 可以的话。
所以我的问题是执行此操作的正确方法是什么以及如何使用 PHP 执行此操作(如果可能)?功能或总体想法都很好,我很适合实现,只是不确定其背后的理论。
I have a file that is constantly added to (a process beyond my control) and I capture that file every x seconds. I want to extract the new contents of the file (added between my previous capture) and work with it. The file unfortunately doesn't have anything to signify when it was last added to and I can't write to this file, so my only option is to store what I already know is in the file and compare it to the new version I have.
Now what I need to know is how I can best do this. I'm using PHP and I figured the simplest solution is to just store the previous contents and then use explode()
to work out what comes after it, this is (quite obviously) a terrible solution as once the file reaches large numbers (1GB+) it's going to be hell to process.
An idea I had would be to store the position of the final character and then work from there, for example if the last character was the 100th I'd then work from the 100th character on the next process, but I'm not sure how I could do this, or if it's even possible with PHP.
So my question is what is the correct method for doing this and how can I do it with PHP (if possible)? Functions or a general idea are fine, I'm good for the implementation, just not sure the theory behind it.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
假设文件只是简单地附加到文件中,直观上最简单的方法是存储以前的文件大小并使用 fseek() 或 file_get_contents 的偏移量参数移动到旧的位置该文件的版本已结束。即:
要第一次启动此操作,您需要将
0
放入last_position.temp
中,这样就不会出现错误或不舒服的感觉。希望这有帮助:)
Assuming the file is simply appended to, it would intuitively be easiest to store the previous file size and use
fseek()
or the offset parameter offile_get_contents
to move to where the old version of the file ended. I.e.:To get this rolling for the first time, you'll want to put
0
inlast_position.temp
so there's no errors or hard feelings.Hope this helps :)