PHP - 如何有效地读取大型远程文件并在循环中使用缓冲区

发布于 2024-11-06 23:31:31 字数 711 浏览 0 评论 0原文

我想了解如何使用读取文件的缓冲区。

假设我们现在有一个包含电子邮件列表的大文件(分隔符是经典的 \n

,我们希望以一种检查方式将每一行与数据库中表的每条记录进行比较就像line_of_file == table_row

如果您有一个普通文件,这是一个简单的任务,否则,如果您有一个巨大文件,服务器通常会在几分钟后停止操作。

那么使用文件缓冲区执行此类操作的最佳方法是什么?

到目前为止我所拥有的是这样的:

$buffer = file_get_contents('file.txt');
while($row = mysql_fetch_array($result)) {
  if ( preg_match('/'.$email.'/im',$buffer)) {
    echo $row_val;
  }
}

$buffer = file_get_contents('file.txt');
$lines = preg_split('/\n/',$buffer); 
//or $lines = explode('\n',$buffer);
while($row = mysql_fetch_array($result)) {
  if ( in_array($email,$lines)) {
    echo $row_val;
  }
}

i would like to understand how to use the buffer of a read file.

Assuming we have a big file with a list of emails line by line ( delimiter is a classic \n )

now, we want compare each line with each record of a table in our database in a kind of check like line_of_file == table_row.

this is a simple task if you have a normal file, otherwise, if you have a huge file the server usually stop the operation after few minute.

so what's the best way of doing this kind of stuff with the file buffer?

what i have so far is something like this:

$buffer = file_get_contents('file.txt');
while($row = mysql_fetch_array($result)) {
  if ( preg_match('/'.$email.'/im',$buffer)) {
    echo $row_val;
  }
}

$buffer = file_get_contents('file.txt');
$lines = preg_split('/\n/',$buffer); 
//or $lines = explode('\n',$buffer);
while($row = mysql_fetch_array($result)) {
  if ( in_array($email,$lines)) {
    echo $row_val;
  }
}

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

青瓷清茶倾城歌 2024-11-13 23:31:31

就像我在对你的问题的投票中已经建议的那样(因此是 CW):

你可以使用 SplFileObject 它实现了 Iterator 来逐行迭代文件以节省内存。 的回答

例如。

Like already suggested in my closevotes to your question (hence CW):

You can use SplFileObject which implements Iterator to iterate over a file line by line to save memory. See my answers to

for examples.

执笏见 2024-11-13 23:31:31

不要对大文件使用 file_get_contents。这会将整个文件一次性拉入内存。你必须分段阅读

$fp = fopen('file.txt', 'r');
while(!feof($fp)){
  //get onle line 
  $buffer = fgets($fp);
   //do your stuff
}
 fclose($fp);

Don't use file_get_contents for large files. This pulls the entire file into memory all at once. You have to read it in pieces

$fp = fopen('file.txt', 'r');
while(!feof($fp)){
  //get onle line 
  $buffer = fgets($fp);
   //do your stuff
}
 fclose($fp);
a√萤火虫的光℡ 2024-11-13 23:31:31

使用 fopen() 打开文件并增量读取。可能使用 fgets() 一次一行。

file_get_contents 将整个文件读取到内存中,如果文件大于几兆字节,则这是不希望的

根据这需要多长时间,您可能需要担心 PHP 执行时间限制或浏览器计时如果 2 分钟内没有收到任何输出,则输出。

您可以尝试的操作:

  1. set_time_limit(0) 以避免超出 PHP 时间限制
  2. 确保每 30 秒左右输出一些数据,这样浏览器就不会超时;确保 flush(); 和可能的 ob_flush(); 这样你的输出实际上是通过网络发送的(这是一个拼凑)
  3. 启动一个单独的进程(例如通过 < code>exec()) 在后台运行它。老实说,任何需要超过一两秒的事情最好在后台运行

Open the file with fopen() and read it incrementally. Probably one line at a time with fgets().

file_get_contents reads the whole file into memory, which is undesirable if the file is larger than a few megabytes

Depending on how long this takes, you may need to worry about the PHP execution time limit, or the browser timing out if it doesn't receive any output for 2 minutes.

Things you might try:

  1. set_time_limit(0) to avoid running up against the PHP time limit
  2. Make sure to output some data every 30 seconds or so so the browser doesn't time out; make sure to flush(); and possibly ob_flush(); so your output is actually sent over the network (this is a kludge)
  3. start a separate process (e.g. via exec()) to run this in the background. Honestly, anything that takes more than a second or two is best run in the background
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文