PHP 脚本逐渐变慢(文件读取器)
我有一个脚本,当与计时器放在一起时,它会逐渐变慢。它非常简单,因为它所做的只是读取一行,检查它然后将其添加到数据库中,然后继续处理下一行。
这是它的输出逐渐变得更糟:
Record: #1,001 Memory: 1,355,360kb taking 1.84s
Record: #1,001 Memory: 1,355,360kb taking 1.84s
Record: #2,002 Memory: 1,355,192kb taking 2.12s
Record: #3,003 Memory: 1,355,192kb taking 2.39s
Record: #4,004 Memory: 1,355,192kb taking 2.65s
Record: #5,005 Memory: 1,355,200kb taking 2.94s
Record: #6,006 Memory: 1,355,376kb taking 3.28s
Record: #7,007 Memory: 1,355,176kb taking 3.56s
Record: #8,008 Memory: 1,355,408kb taking 3.81s
Record: #9,009 Memory: 1,355,464kb taking 4.07s
Record: #10,010 Memory: 1,355,392kb taking 4.32s
Record: #11,011 Memory: 1,355,352kb taking 4.63s
Record: #12,012 Memory: 1,355,376kb taking 4.90s
Record: #13,013 Memory: 1,355,200kb taking 5.14s
Record: #14,014 Memory: 1,355,184kb taking 5.43s
Record: #15,015 Memory: 1,355,344kb taking 5.72s
不幸的是,该文件大约有 20GB,所以当整个文件以增加的速度读取时我可能已经死了。代码(主要)在下面,但我怀疑它与 fgets() 有关,但我不确定是什么。
$handle = fopen ($import_file, 'r');
while ($line = fgets ($handle))
{
$data = json_decode ($line);
save_record ($data, $line);
}
提前致谢!
编辑:
注释掉“save_record ($data, $line);”似乎什么也没做。
I have a script that, when put against a timer, gets progressively slower. It's fairly simple as all it does is reads a line, checks it then adds it to the database, then proceeds to the next line.
Here's the output of it gradually getting worse:
Record: #1,001 Memory: 1,355,360kb taking 1.84s
Record: #1,001 Memory: 1,355,360kb taking 1.84s
Record: #2,002 Memory: 1,355,192kb taking 2.12s
Record: #3,003 Memory: 1,355,192kb taking 2.39s
Record: #4,004 Memory: 1,355,192kb taking 2.65s
Record: #5,005 Memory: 1,355,200kb taking 2.94s
Record: #6,006 Memory: 1,355,376kb taking 3.28s
Record: #7,007 Memory: 1,355,176kb taking 3.56s
Record: #8,008 Memory: 1,355,408kb taking 3.81s
Record: #9,009 Memory: 1,355,464kb taking 4.07s
Record: #10,010 Memory: 1,355,392kb taking 4.32s
Record: #11,011 Memory: 1,355,352kb taking 4.63s
Record: #12,012 Memory: 1,355,376kb taking 4.90s
Record: #13,013 Memory: 1,355,200kb taking 5.14s
Record: #14,014 Memory: 1,355,184kb taking 5.43s
Record: #15,015 Memory: 1,355,344kb taking 5.72s
The file, unfortunately, is around ~20gb so I'll probably be dead by the time the whole thing is read at the rate of increase. The code is (mainly) below but I suspect it's something to do with fgets() , but I am not sure what.
$handle = fopen ($import_file, 'r');
while ($line = fgets ($handle))
{
$data = json_decode ($line);
save_record ($data, $line);
}
Thanks in advance!
EDIT:
Commenting out 'save_record ($data, $line);' appears to do nothing.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
有时最好使用系统命令来读取这些大文件。我遇到了类似的情况,这是我使用的一个小技巧:
我不会推荐对不可信的文件使用此方法,但它运行速度很快,因为它使用系统一次提取一条记录。希望这有帮助。
Sometimes it is better to use system commands for reading these large files. I ran into something similar and here is a little trick I used:
I would not recommend this with files that cannot be trusted, but it runs fast since it pulls one record at a time using the system. Hope this helps.
http://php.net/manual/en/function.fgets.php
根据 Leigh Purdie 的评论,使用
fgets
处理大文件时存在一些性能问题。如果你的 JSON 对象比他的测试线大,你可能会更快地使用 http://php.net/manual/en/function.stream-get-line.php 并指定长度限制
http://php.net/manual/en/function.fgets.php
According to Leigh Purdie comment, there are some performance issue on big files with
fgets
. If your JSON objects are bigger than his test lines, you might it the limits much fasteruse http://php.net/manual/en/function.stream-get-line.php and specify a length limit
好吧,性能问题。显然,有些东西不应该是二次方的,或者更重要的是,应该是恒定时间的东西似乎与迄今为止处理的记录数量呈线性关系。第一个问题是显示问题的最小代码片段是什么。我想知道当您注释掉除逐行读取文件之外的所有内容时是否会遇到相同的问题行为。如果是这样,那么您将需要一种没有这个问题的语言。 (有很多。)无论如何,一旦您看到预期的时间特征,就将语句一一添加回来,直到您的计时变得混乱为止,您就会发现问题。
你用某种仪器来获取时间。通过单独执行 15000 次左右来确保它们不会导致问题。
Alright, a performance problem. Obviously something is going quadratic when it shouldn't, or more to the point, something that should be constant-time seems to be linear in the number of records dealt with so far. The first question is what's the minimal scrap of code that exhibits the problem. I would want to know if you get the same problematic behavior when you comment out all but reading the file line by line. If so, then you'll need a language without that problem. (There are plenty.) Anyway, once you see the expected time characteristic, add statements back in one-by-one until your timing goes haywire, and you'll have identified the problem.
You instrumented something or other to get the timings. Make sure those can't cause a problem by executing them alone 15000 times or so.
我在试图找到一种方法让我更快地浏览 96G 文本文件时发现了这个问题。我最初编写的脚本花了 15 个小时才达到 0.1%...
我已经尝试了这里建议的一些解决方案,使用stream_get_line、fgets 和 exec for sed。我最终采用了一种不同的方法,我想我会与其他遇到这个问题的人分享。
分割文件! :-)
在我的 freebsd 机器上(对于 linux 和其他系统也存在)我有一个名为“split”的命令行实用程序。
所以我跑了:
然后我在 /data/var/myfile-log/ 目录中得到了 5608 个文件,然后可以使用如下命令一次处理所有文件:
I found this question while trying to find a way for me to more quickly go thru a 96G text file. The script I initially wrote took 15 hours to reach 0.1%...
I have tried some of the solutions suggested here, using stream_get_line, fgets and exec for sed. I ended up with a different approach that I thought I would share with anyone else stopping by this question.
Split the file up! :-)
On my freebsd box (also exists for linux and others) I have a command line utility named 'split'.
So I ran :
Then I ended up with 5608 files in the /data/var/myfile-log/ directory, which could then all be processed one at time with a command like :