是否可以加快 PHP 中的递归文件扫描速度？

发布于 2024-07-14 21:28:40 字数 1730 浏览 5 评论 0原文

我一直在尝试在 PHP 中复制 Gnu Find （“find .”），但它似乎不可能接近它的速度。 PHP 实现使用的时间至少是 Find 的两倍。有没有更快的方法可以用 PHP 来做到这一点？

编辑：我添加了一个使用 SPL 实现的代码示例 - 它的性能等于迭代方法

编辑2：当从 PHP 调用 find 时，它实际上比本机 PHP 实现慢。我想我应该对我所拥有的感到满意:)

// measured to 317% of gnu find's speed when run directly from a shell
function list_recursive($dir) { 
  if ($dh = opendir($dir)) {
    while (false !== ($entry = readdir($dh))) {
      if ($entry == '.' || $entry == '..') continue;

      $path = "$dir/$entry";
      echo "$path\n";
      if (is_dir($path)) list_recursive($path);       
    }
    closedir($d);
  }
}

// measured to 315% of gnu find's speed when run directly from a shell
function list_iterative($from) {
  $dirs = array($from);  
  while (NULL !== ($dir = array_pop($dirs))) {  
    if ($dh = opendir($dir)) {    
      while (false !== ($entry = readdir($dh))) {      
        if ($entry == '.' || $entry == '..') continue;        

        $path = "$dir/$entry";        
        echo "$path\n";        
        if (is_dir($path)) $dirs[] = $path;        
      }      
      closedir($dh);      
    }    
  }  
}

// measured to 315% of gnu find's speed when run directly from a shell
function list_recursivedirectoryiterator($path) {
  $it = new RecursiveDirectoryIterator($path);
  foreach ($it as $file) {
    if ($file->isDot()) continue;

    echo $file->getPathname();
  }
}

// measured to 390% of gnu find's speed when run directly from a shell
function list_gnufind($dir) { 
  $dir = escapeshellcmd($dir);
  $h = popen("/usr/bin/find $dir", "r");
  while ('' != ($s = fread($h, 2048))) {
    echo $s;
  }
  pclose($h);
}

原文

I've been trying to replicate Gnu Find ("find .") in PHP, but it seems impossible to get even close to its speed. The PHP implementations use at least twice the time of Find. Are there faster ways of doing this with PHP?

EDIT: I added a code example using the SPL implementation -- its performance is equal to the iterative approach

EDIT2: When calling find from PHP it was actually slower than the native PHP implementation. I guess I should be satisfied with what I've got :)

// measured to 317% of gnu find's speed when run directly from a shell
function list_recursive($dir) { 
  if ($dh = opendir($dir)) {
    while (false !== ($entry = readdir($dh))) {
      if ($entry == '.' || $entry == '..') continue;

      $path = "$dir/$entry";
      echo "$path\n";
      if (is_dir($path)) list_recursive($path);       
    }
    closedir($d);
  }
}

// measured to 315% of gnu find's speed when run directly from a shell
function list_iterative($from) {
  $dirs = array($from);  
  while (NULL !== ($dir = array_pop($dirs))) {  
    if ($dh = opendir($dir)) {    
      while (false !== ($entry = readdir($dh))) {      
        if ($entry == '.' || $entry == '..') continue;        

        $path = "$dir/$entry";        
        echo "$path\n";        
        if (is_dir($path)) $dirs[] = $path;        
      }      
      closedir($dh);      
    }    
  }  
}

// measured to 315% of gnu find's speed when run directly from a shell
function list_recursivedirectoryiterator($path) {
  $it = new RecursiveDirectoryIterator($path);
  foreach ($it as $file) {
    if ($file->isDot()) continue;

    echo $file->getPathname();
  }
}

// measured to 390% of gnu find's speed when run directly from a shell
function list_gnufind($dir) { 
  $dir = escapeshellcmd($dir);
  $h = popen("/usr/bin/find $dir", "r");
  while ('' != ($s = fread($h, 2048))) {
    echo $s;
  }
  pclose($h);
}

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

记忆消瘦 2024-07-21 21:28:41

我不确定性能是否更好，但您可以使用递归目录迭代器来使代码更简单...请参阅 RecursiveDirectoryIterator 和 'SplFileInfo`.

$it = new RecursiveDirectoryIterator($from);
foreach ($it as $file)
{
    if ($file->isDot())
        continue;

    echo $file->getPathname();
}

I'm not sure if the performance is better, but you could use a recursive directory iterator to make your code simpler... See RecursiveDirectoryIterator and 'SplFileInfo`.

$it = new RecursiveDirectoryIterator($from);
foreach ($it as $file)
{
    if ($file->isDot())
        continue;

    echo $file->getPathname();
}

回复收藏 0 原文

离旧人 2024-07-21 21:28:41

在开始更改任何内容之前，分析您的代码。

使用诸如 Xdebug 之类的东西（加上 kcachegrind 以获得漂亮的图表）来找出缓慢的部分在哪里。如果你开始盲目地改变事情，你将一事无成。

我唯一的其他建议是使用已经发布的 SPL 目录迭代器。让内部 C 代码完成工作几乎总是更快。

回复收藏 0 原文

野生奥特曼 2024-07-21 21:28:41

PHP 的执行速度不如 C，简单明了。

回复收藏 0 原文

长发绾君心 2024-07-21 21:28:41

为什么您期望解释的 PHP 代码与编译的 C 版本的 find 一样快？仅慢两倍实际上就相当不错了。

关于我要添加的唯一建议是在开头执行 ob_start() 并在末尾执行 ob_get_contents()、ob_end_clean() 。这可能会加快速度。

回复收藏 0 原文

随风而去 2024-07-21 21:28:41

您保持 N 个目录流打开，其中 N 是目录树的深度。相反，尝试一次读取整个目录的条目，然后迭代这些条目。至少您可以最大限度地利用桌面 I/O 缓存。

回复收藏 0 原文

真心难拥有 2024-07-21 21:28:41

您可能需要认真考虑仅使用 GNU find。如果它可用，并且安全模式未打开，您可能会喜欢结果：

function list_recursive($dir) { 
  $dir=escapeshellcmd($dir);
  $h = popen("/usr/bin/find $dir -type f", "r")
  while ($s = fgets($h,1024)) { 
    echo $s;
  }
  pclose($h);
}

但是可能有一些目录太大，您也不想为此烦恼。考虑以其他方式分摊速度。您的第二次尝试可以通过简单地将目录堆栈保存在会话中来设置检查点（例如）。如果您向用户提供文件列表，只需收集一页，然后将剩余的状态保存在第 2 页的会话中。

You might want to seriously consider just using GNU find. If it's available, and safe mode isn't turned on, you'll probably like the results just fine:

function list_recursive($dir) { 
  $dir=escapeshellcmd($dir);
  $h = popen("/usr/bin/find $dir -type f", "r")
  while ($s = fgets($h,1024)) { 
    echo $s;
  }
  pclose($h);
}

However there might to be some directory that's so big, you're not going to want to bother with this either. Consider amortizing the slowness in other ways. Your second try can be checkpointed (for example) by simply saving the directory stack in the session. If you're giving the user a list of files, simply collect a pageful then save the rest of the state in the session for page 2.

回复收藏 0 原文

阿楠 2024-07-21 21:28:41

尝试使用 scandir() 一次读取整个目录，正如 Jason Cohen 所建议的那样。我将以下代码基于 scandir() 的 php 手册注释中的代码

 function scan( $dir ){
        $dirs = array_diff( scandir( $dir ), Array( ".", ".." ));
        $dir_array = Array();
        foreach( $dirs as $d )
            $dir_array[ $d ] = is_dir($dir."/".$d) ? scan( $dir."/".$d) : print $dir."/".$d."\n";
 }

Try using scandir() to read a whole directory at once, as Jason Cohen has suggested. I've based the following code on code from the php manual comments for scandir()

 function scan( $dir ){
        $dirs = array_diff( scandir( $dir ), Array( ".", ".." ));
        $dir_array = Array();
        foreach( $dirs as $d )
            $dir_array[ $d ] = is_dir($dir."/".$d) ? scan( $dir."/".$d) : print $dir."/".$d."\n";
 }

回复收藏 0 原文

~没有更多了~