PHP - 如何有效地读取大型远程文件并在循环中使用缓冲区
我想了解如何使用读取文件的缓冲区。
假设我们现在有一个包含电子邮件列表的大文件(分隔符是经典的 \n
)
,我们希望以一种检查方式将每一行与数据库中表的每条记录进行比较就像line_of_file == table_row
。
如果您有一个普通文件,这是一个简单的任务,否则,如果您有一个巨大文件,服务器通常会在几分钟后停止操作。
那么使用文件缓冲区执行此类操作的最佳方法是什么?
到目前为止我所拥有的是这样的:
$buffer = file_get_contents('file.txt');
while($row = mysql_fetch_array($result)) {
if ( preg_match('/'.$email.'/im',$buffer)) {
echo $row_val;
}
}
$buffer = file_get_contents('file.txt');
$lines = preg_split('/\n/',$buffer);
//or $lines = explode('\n',$buffer);
while($row = mysql_fetch_array($result)) {
if ( in_array($email,$lines)) {
echo $row_val;
}
}
i would like to understand how to use the buffer of a read file.
Assuming we have a big file with a list of emails line by line ( delimiter is a classic \n
)
now, we want compare each line with each record of a table in our database in a kind of check like line_of_file == table_row
.
this is a simple task if you have a normal file, otherwise, if you have a huge file the server usually stop the operation after few minute.
so what's the best way of doing this kind of stuff with the file buffer?
what i have so far is something like this:
$buffer = file_get_contents('file.txt');
while($row = mysql_fetch_array($result)) {
if ( preg_match('/'.$email.'/im',$buffer)) {
echo $row_val;
}
}
$buffer = file_get_contents('file.txt');
$lines = preg_split('/\n/',$buffer);
//or $lines = explode('\n',$buffer);
while($row = mysql_fetch_array($result)) {
if ( in_array($email,$lines)) {
echo $row_val;
}
}
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
就像我在对你的问题的投票中已经建议的那样(因此是 CW):
你可以使用 SplFileObject 它实现了 Iterator 来逐行迭代文件以节省内存。 的回答
例如。
Like already suggested in my closevotes to your question (hence CW):
You can use SplFileObject which implements Iterator to iterate over a file line by line to save memory. See my answers to
for examples.
不要对大文件使用 file_get_contents。这会将整个文件一次性拉入内存。你必须分段阅读
Don't use file_get_contents for large files. This pulls the entire file into memory all at once. You have to read it in pieces
使用
fopen()
打开文件并增量读取。可能使用fgets()
一次一行。file_get_contents
将整个文件读取到内存中,如果文件大于几兆字节,则这是不希望的根据这需要多长时间,您可能需要担心 PHP 执行时间限制或浏览器计时如果 2 分钟内没有收到任何输出,则输出。
您可以尝试的操作:
set_time_limit(0)
以避免超出 PHP 时间限制flush();
和可能的ob_flush();
这样你的输出实际上是通过网络发送的(这是一个拼凑)Open the file with
fopen()
and read it incrementally. Probably one line at a time withfgets()
.file_get_contents
reads the whole file into memory, which is undesirable if the file is larger than a few megabytesDepending on how long this takes, you may need to worry about the PHP execution time limit, or the browser timing out if it doesn't receive any output for 2 minutes.
Things you might try:
set_time_limit(0)
to avoid running up against the PHP time limitflush();
and possiblyob_flush();
so your output is actually sent over the network (this is a kludge)exec()
) to run this in the background. Honestly, anything that takes more than a second or two is best run in the background