当前位置：文江博客话题详情

PHP performance file-exists

PHP 中的 file_exists() 太慢了。谁能建议一个更快的替代方案？

发布于 2024-08-11 03:09:14 字数 312 浏览 12 评论 0原文

在我们的网站上显示图像时，我们会通过调用 file_exists() 检查文件是否存在。如果文件丢失，我们会退回到虚拟图像。

然而，分析表明，这是生成页面最慢的部分，每个文件 file_exists() 最多需要 1/2 毫秒。我们只测试了 40 个左右的文件，但这仍然会导致页面加载时间增加 20 毫秒。

任何人都可以建议一种方法来加快速度吗？是否有更好的方法来测试文件是否存在？如果我构建某种缓存，我应该如何保持同步。

收藏 0

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

评论（20）

囚你心 2024-08-18 03:09:14

file_exists() 应该是一个非常便宜的操作。另请注意，file_exists 构建自己的缓存以帮助提高性能。

请参阅： http://php.net/manual/en/function.file-exists .php

回复收藏 0 原文

去了角落 2024-08-18 03:09:14

使用绝对路径！根据您的 include_path 设置，如果您检查相对文件路径，PHP 会检查所有（！）这些目录！在检查是否存在之前，您可以暂时取消设置 include_path。

realpath() 的作用相同，但我不知道它是否更快。

但文件访问 I/O 总是很慢。通常，硬盘访问比比处理器中的计算速度慢。

回复收藏 0 原文

酒废 2024-08-18 03:09:14

检查本地文件是否存在的最快方法是 stream_resolve_include_path()：

if (false !== stream_resolve_include_path($s3url)) { 
  //do stuff 
}

性能结果stream_resolve_include_path() vs file_exists()：

Test name       Repeats         Result          Performance     
stream_resolve  10000           0.051710 sec    +0.00%
file_exists     10000           0.067452 sec    -30.44%

在测试中使用绝对路径。
测试源位于此处。
PHP版本：

PHP 5.4.23-1~dotdeb.1 (cli)（构建时间：2013 年 12 月 13 日 21:53:21）
版权所有 (c) 1997-2013 PHP 集团
Zend Engine v2.4.0，版权所有 (c) 1998-2013 Zend Technologies

The fastest way to check existence of a local file is stream_resolve_include_path():

if (false !== stream_resolve_include_path($s3url)) { 
  //do stuff 
}

Performance results stream_resolve_include_path() vs file_exists():

Test name       Repeats         Result          Performance     
stream_resolve  10000           0.051710 sec    +0.00%
file_exists     10000           0.067452 sec    -30.44%

In test used absolute paths.
Test source is here.
PHP version:

PHP 5.4.23-1~dotdeb.1 (cli) (built: Dec 13 2013 21:53:21)
Copyright (c) 1997-2013 The PHP Group
Zend Engine v2.4.0, Copyright (c) 1998-2013 Zend Technologies

回复收藏 0 原文

玩心态 2024-08-18 03:09:14

如果文件丢失，我们会退回到虚拟图像

如果您只是想回退到此虚拟图像，您可能需要考虑让客户端通过重定向（到虚拟图像）与服务器进行协商文件未找到。

这样，您只会有一点重定向开销，并且客户端的延迟不会明显。至少您将摆脱对 file_exists 的“昂贵”调用（我知道事实并非如此）。

只是一个想法。

回复收藏 0 原文

梦幻的味道 2024-08-18 03:09:14

PHP 5.6 基准测试：

现有文件：

0.0012969970 : stream_resolve_include_path + include  
0.0013520717 : file_exists + include  
0.0013728141 : @include

无效文件：

0.0000281333 : file_exists + include  
0.0000319480 : stream_resolve_include_path + include  
0.0001471042 : @include

无效文件夹：

0.0000281333 : file_exists + include  
0.0000360012 : stream_resolve_include_path + include  
0.0001239776 : @include

代码：

// microtime(true) is less accurate.
function microtime_as_num($microtime){
  $time = array_sum(explode(' ', $microtime));
  return $time;
}

function test_error_suppression_include ($file) {
  $x = 0;
  $x = @include($file);
  return $x;
}

function test_file_exists_include($file) {
  $x = 0;
  $x = file_exists($file);
  if ($x === true) {
    include $file;
  }
  return $x;
}

function test_stream_resolve_include_path_include($file) {
  $x = 0;
  $x = stream_resolve_include_path($file);
  if ($x !== false) {
    include $file;
  }
  return $x;
}

function run_test($file, $test_name) {
  echo $test_name . ":\n";
  echo str_repeat('=',strlen($test_name) + 1) . "\n";

  $results = array();
  $dec = 10000000000; // digit precision as a multiplier

  $i = 0;
  $j = 0;
  $time_start = 0;
  $time_end = 0;
  $x = -1;
  $time = 0;

  $time_start = microtime();
  $x= test_error_suppression_include($file);
  $time_end = microtime();
  $time = microtime_as_num($time_end) - microtime_as_num($time_start);

  $results[$time*$dec] = '@include';

  $i = 0;
  $j = 0;
  $time_start = 0;
  $time_end = 0;
  $x = -1;
  $time = 0;

  $time_start = microtime();
  $x= test_stream_resolve_include_path_include($file);
  $time_end = microtime();
  $time = microtime_as_num($time_end) - microtime_as_num($time_start);

  $results[$time * $dec] = 'stream_resolve_include_path + include';

  $i = 0;
  $j = 0;
  $time_start = 0;
  $time_end = 0;
  $x = -1;
  $time = 0;

  $time_start = microtime();
  $x= test_file_exists_include($file);
  $time_end = microtime();
  $time = microtime_as_num($time_end) - microtime_as_num($time_start);

  $results[$time * $dec ] = 'file_exists + include';

  ksort($results, SORT_NUMERIC);

  foreach($results as $seconds => $test) {
    echo number_format($seconds/$dec,10) . ' : ' . $test . "\n";
  }
  echo "\n\n";
}

run_test($argv[1],$argv[2]);

命令行执行：

php test.php '/path/to/existing_but_empty_file.php' 'Existing File'  
php test.php '/path/to/non_existing_file.php' 'Invalid File'  
php test.php '/path/invalid/non_existing_file.php' 'Invalid Folder'

Benchmarks with PHP 5.6:

Existing File:

0.0012969970 : stream_resolve_include_path + include  
0.0013520717 : file_exists + include  
0.0013728141 : @include

Invalid File:

0.0000281333 : file_exists + include  
0.0000319480 : stream_resolve_include_path + include  
0.0001471042 : @include

Invalid Folder:

0.0000281333 : file_exists + include  
0.0000360012 : stream_resolve_include_path + include  
0.0001239776 : @include

Code:

// microtime(true) is less accurate.
function microtime_as_num($microtime){
  $time = array_sum(explode(' ', $microtime));
  return $time;
}

function test_error_suppression_include ($file) {
  $x = 0;
  $x = @include($file);
  return $x;
}

function test_file_exists_include($file) {
  $x = 0;
  $x = file_exists($file);
  if ($x === true) {
    include $file;
  }
  return $x;
}

function test_stream_resolve_include_path_include($file) {
  $x = 0;
  $x = stream_resolve_include_path($file);
  if ($x !== false) {
    include $file;
  }
  return $x;
}

function run_test($file, $test_name) {
  echo $test_name . ":\n";
  echo str_repeat('=',strlen($test_name) + 1) . "\n";

  $results = array();
  $dec = 10000000000; // digit precision as a multiplier

  $i = 0;
  $j = 0;
  $time_start = 0;
  $time_end = 0;
  $x = -1;
  $time = 0;

  $time_start = microtime();
  $x= test_error_suppression_include($file);
  $time_end = microtime();
  $time = microtime_as_num($time_end) - microtime_as_num($time_start);

  $results[$time*$dec] = '@include';

  $i = 0;
  $j = 0;
  $time_start = 0;
  $time_end = 0;
  $x = -1;
  $time = 0;

  $time_start = microtime();
  $x= test_stream_resolve_include_path_include($file);
  $time_end = microtime();
  $time = microtime_as_num($time_end) - microtime_as_num($time_start);

  $results[$time * $dec] = 'stream_resolve_include_path + include';

  $i = 0;
  $j = 0;
  $time_start = 0;
  $time_end = 0;
  $x = -1;
  $time = 0;

  $time_start = microtime();
  $x= test_file_exists_include($file);
  $time_end = microtime();
  $time = microtime_as_num($time_end) - microtime_as_num($time_start);

  $results[$time * $dec ] = 'file_exists + include';

  ksort($results, SORT_NUMERIC);

  foreach($results as $seconds => $test) {
    echo number_format($seconds/$dec,10) . ' : ' . $test . "\n";
  }
  echo "\n\n";
}

run_test($argv[1],$argv[2]);

Command line Execution:

php test.php '/path/to/existing_but_empty_file.php' 'Existing File'  
php test.php '/path/to/non_existing_file.php' 'Invalid File'  
php test.php '/path/invalid/non_existing_file.php' 'Invalid Folder'

回复收藏 0 原文

爱你不解释 2024-08-18 03:09:14

创建一个哈希例程，将文件分片到多个子目录中。

文件名.jpg -> 012345-> /01/23/45.jpg

另外，您可以使用 mod_rewrite 返回您的占位符图像，以请求 404 到您的图像目录。

回复收藏 0 原文

吝吻 2024-08-18 03:09:14

file_exists() 由 PHP 自动缓存。我认为您不会在 PHP 中找到更快的函数来检查文件是否存在。

请参阅此帖子。

回复收藏 0 原文

溺渁∝ 2024-08-18 03:09:14

我不完全知道你想做什么，但你可以让客户端处理它。

回复收藏 0 原文

肤浅与狂妄 2024-08-18 03:09:14

如果您想检查图像文件是否存在，更快的方法是使用getimagesize！

本地和远程速度更快！

if(!@GetImageSize($image_path_or_url)) // False means no imagefile
 {
 // Do something
 }

If you want to check existence of an image file, a much faster way is to use getimagesize !

Faster locally and remotely!

if(!@GetImageSize($image_path_or_url)) // False means no imagefile
 {
 // Do something
 }

回复收藏 0 原文

离去的眼神 2024-08-18 03:09:14

老问题，我要在这里添加一个答案。对于 php 5.3.8，is_file()（对于现有文件）速度要快一个数量级。对于不存在的文件，时间几乎相同。对于带有 eaccelerator 的 PHP 5.1，它们更接近一些。

PHP 5.3.8 w &不

time ratio (1000 iterations)
Array
(
    [3."is_file('exists')"] => 1.00x    (0.002305269241333)
    [5."is_link('exists')"] => 1.21x    (0.0027914047241211)
    [7."stream_resolve_inclu"(exists)] => 2.79x (0.0064241886138916)
    [1."file_exists('exists')"] => 13.35x   (0.030781030654907)
    [8."stream_resolve_inclu"(nonexists)] => 14.19x (0.032708406448364)
    [4."is_file('nonexists)"] => 14.23x (0.032796382904053)
    [6."is_link('nonexists)"] => 14.33x (0.033039808273315)
    [2."file_exists('nonexists)"] => 14.77x (0.034039735794067)
)

带 eaccelerator 的 APC PHP 5.1

time ratio (1000x)
Array
(
    [3."is_file('exists')"] => 1.00x    (0.000458002090454)
    [5."is_link('exists')"] => 1.22x    (0.000559568405151)
    [6."is_link('nonexists')"] => 3.27x (0.00149989128113)
    [4."is_file('nonexists')"] => 3.36x (0.00153875350952)
    [2."file_exists('nonexists')"] => 3.92x (0.00179600715637)
    [1."file_exists('exists"] => 4.22x  (0.00193166732788)
)

有一些注意事项。
1) 并非所有“文件”都是文件， is_file() 测试的是常规文件，而不是符号链接。因此，在 *nix 系统上，除非您确定您只处理常规文件，否则您无法仅使用 is_file()。对于上传等，这可能是一个合理的假设，或者如果服务器是基于 Windows 的，它实际上没有符号链接。否则，您必须测试 is_file($file) || is_link($file)。

2) 如果文件丢失并且变得大致相等，那么所有方法的性能肯定会降低。

3）最大的警告。所有方法都会缓存文件统计信息以加快查找速度，因此如果文件定期或快速更改、删除、重新出现、删除，则必须运行 clearstatcache(); 以确保正确的文件存在信息在缓存中。所以我测试了这些。我省略了所有文件名等。重要的是几乎所有时间都收敛，除了stream_resolve_include，它的速度是原来的4倍。同样，该服务器上有 eaccelerator，所以 YMMV。

time ratio (1000x)
Array
(
    [7."stream_resolve_inclu...;clearstatcache();"] => 1.00x    (0.0066831111907959)
    [1."file_exists(...........;clearstatcache();"] => 4.39x    (0.029333114624023)
    [3."is_file(................;clearstatcache();] => 4.55x    (0.030423402786255)
    [5."is_link(................;clearstatcache();] => 4.61x    (0.030798196792603)
    [4."is_file(................;clearstatcache();] => 4.89x    (0.032709360122681)
    [8."stream_resolve_inclu...;clearstatcache();"] => 4.90x    (0.032740354537964)
    [2."file_exists(...........;clearstatcache();"] => 4.92x    (0.032855272293091)
    [6."is_link(...............;clearstatcache();"] => 5.11x    (0.034154653549194)
)

基本上，这个想法是，如果您 100% 确定它是一个文件，而不是符号链接或目录，并且很可能它会存在，那么使用 is_file()。你会看到一定的收获。如果文件在任何时刻都可以是文件或符号链接，则失败的 is_file() 14x + is_link() 14x (is_file() || is_link())，最终将是 2x整体较慢。如果文件的存在发生很大变化，则使用stream_resolve_include_path()。

所以这取决于你的使用场景。

Old question, I'm going to add an answer here. For php 5.3.8, is_file() (for an existing file) is an order of magnitude faster. For a non-existing file, the times are nearly identical. For PHP 5.1 with eaccelerator, they are a little closer.

PHP 5.3.8 w & w/o APC

time ratio (1000 iterations)
Array
(
    [3."is_file('exists')"] => 1.00x    (0.002305269241333)
    [5."is_link('exists')"] => 1.21x    (0.0027914047241211)
    [7."stream_resolve_inclu"(exists)] => 2.79x (0.0064241886138916)
    [1."file_exists('exists')"] => 13.35x   (0.030781030654907)
    [8."stream_resolve_inclu"(nonexists)] => 14.19x (0.032708406448364)
    [4."is_file('nonexists)"] => 14.23x (0.032796382904053)
    [6."is_link('nonexists)"] => 14.33x (0.033039808273315)
    [2."file_exists('nonexists)"] => 14.77x (0.034039735794067)
)

PHP 5.1 w/ eaccelerator

time ratio (1000x)
Array
(
    [3."is_file('exists')"] => 1.00x    (0.000458002090454)
    [5."is_link('exists')"] => 1.22x    (0.000559568405151)
    [6."is_link('nonexists')"] => 3.27x (0.00149989128113)
    [4."is_file('nonexists')"] => 3.36x (0.00153875350952)
    [2."file_exists('nonexists')"] => 3.92x (0.00179600715637)
    [1."file_exists('exists"] => 4.22x  (0.00193166732788)
)

There are a couple of caveats.
1) Not all "files" are files, is_file() tests for regular files, not symlinks. So on a *nix system, you can't get away with just is_file() unless you are sure that you are only dealing with regular files. For uploads, etc, this may be a fair assumption, or if the server is Windows based, which does not actually have symlinks. Otherwise, you'll have to test is_file($file) || is_link($file).

2) Performance definitely degrades for all methods if the file is missing and becomes roughly equal.

3) Biggest caveat. All the methods cache the file statistics to speed lookup, so if the file is changing regularly or quickly, deleted, reappears, deletes, then clearstatcache(); has to be run to insure that the correct file existence information is in the cache. So I tested those. I left out all the filenames and such. The important thing is that almost all the times converge, except stream_resolve_include, which is 4x as fast. Again, this server has eaccelerator on it, so YMMV.

time ratio (1000x)
Array
(
    [7."stream_resolve_inclu...;clearstatcache();"] => 1.00x    (0.0066831111907959)
    [1."file_exists(...........;clearstatcache();"] => 4.39x    (0.029333114624023)
    [3."is_file(................;clearstatcache();] => 4.55x    (0.030423402786255)
    [5."is_link(................;clearstatcache();] => 4.61x    (0.030798196792603)
    [4."is_file(................;clearstatcache();] => 4.89x    (0.032709360122681)
    [8."stream_resolve_inclu...;clearstatcache();"] => 4.90x    (0.032740354537964)
    [2."file_exists(...........;clearstatcache();"] => 4.92x    (0.032855272293091)
    [6."is_link(...............;clearstatcache();"] => 5.11x    (0.034154653549194)
)

Basically, the idea is, if you're 100% sure that it is a file, not a symlink or a directory, and in all probability, it will exist, then use is_file(). You'll see a definite gain. If the file could be a file or a symlink at any moment, then the failed is_file() 14x + is_link() 14x (is_file() || is_link()), and will end up being 2x slower overall. If the file's existence changes A LOT, then use stream_resolve_include_path().

So it depends on your usage scenario.

回复收藏 0 原文

海拔太高太耀眼 2024-08-18 03:09:14

如果您仅检查现有文件，请使用is_file()。
file_exists() 检查现有文件或目录，因此 is_file() 可能会更快一些。

回复收藏 0 原文

半边脸i 2024-08-18 03:09:14

2021 年，也就是问题提出 12 年后，我遇到了同样的用例。我对这里的答案不满意并做了一个实验。我循环检查带有 file_exist 的文件夹中的图像中是否存在大约 40 张图像中的一张。

以下是以毫秒为单位的数字（PHP 7.4）：

在本地开发计算机（Win10、WAMP、三星 SSD、Intel 3.4GHz）上：每个图像大约 0.1 (1/10) 毫秒，文件夹中大约 1000 个图像；
在服务器上（非常基本的便宜的，VPS 1 Intel Xeon，RAM 2GB，SSD，Ubuntu，LAMP）：每个图像大约 0.01 (1/100) 毫秒，文件夹中有 14,000 个图像；

服务器比开发机器快 10 倍，并且与整体 UX 性能 POV 没有什么区别，其中 30-50 毫秒是第一个明显的阈值。

在服务器上检查 40 张图像的数组时，我花了 0.4 毫秒来检查其中是否有不存在的图像。顺便说一句，无论某些图像是否存在，性能都没有差异。

因此，由于磁盘性能的原因，是否检查 file_exist 应该是没有问题的。检查一下是否需要。

回复收藏 0 原文

野稚 2024-08-18 03:09:14

它们都在同一个目录中吗？如果是这样，可能值得获取文件列表并将它们存储在散列中并与该列表进行比较，而不是所有 file_exists 查找。

回复收藏 0 原文

紫﹏色ふ单纯 2024-08-18 03:09:14

我发现每次调用 1/2ms 非常非常实惠。我认为没有更快的替代方案，因为文件函数非常接近处理文件操作的较低层。

但是，您可以为 file_exists() 编写一个包装器，将结果缓存到内存缓存或类似的设施中。这应该可以将日常使用的时间减少到几乎没有时间。

回复收藏 0 原文

酸甜透明夹心 2024-08-18 03:09:14

您可以执行一个 cronjob 定期创建图像列表并将它们存储在 DB/file/BDB/...

每半小时应该没问题，但请务必创建一个接口来重置缓存，以防文件添加/删除。

然后，运行 find 也很容易。 -mmin -30 -print0 在 shell 上并添加新文件。

回复收藏 0 原文

假情假意假温柔 2024-08-18 03:09:14

当您将文件保存到文件夹时，如果上传成功，您可以将路径存储到数据库表中。

然后，您只需对数据库进行查询即可找到所请求文件的路径。

回复收藏 0 原文

一个人练习一个人 2024-08-18 03:09:14

我来到这个页面寻找解决方案，似乎 fopen 可以解决这个问题。如果您使用此代码，您可能希望禁用未找到的文件的错误日志记录。

<?php
for ($n=1;$n<100;$n++){
clearstatcache();
$h=@fopen("files.php","r");
if ($h){
echo "F";
fclose($h);
}else{
echo "N";
}
}
?>

I came to this page looking for a solution, and it seems fopen may do the trick. If you use this code, you might want to disable error logging for the files that are not found.

<?php
for ($n=1;$n<100;$n++){
clearstatcache();
$h=@fopen("files.php","r");
if ($h){
echo "F";
fclose($h);
}else{
echo "N";
}
}
?>

回复收藏 0 原文

冰雪之触 2024-08-18 03:09:14

我认为最好的方法是将图像 url 保存在数据库中，然后将其放入会话变量中，尤其是当您进行身份验证时。这样您就不必在每次重新加载页面时进行检查

回复收藏 0 原文

私野 2024-08-18 03:09:14

那glob()呢？但我不确定它是否很快。

http://www.php.net/manual/en/function.glob。 php

回复收藏 0 原文

养猫人 2024-08-18 03:09:14

我什至不确定这是否会更快，但看起来您仍然想进行基准测试：

构建所有图像路径的大型数组的缓存。

$array = array('/path/to/file.jpg' => true, '/path/to/file2.gif' => true);

更新缓存每小时或每天取决于您的要求。您可以利用 cron 运行 PHP 脚本来执行此操作，该脚本将递归地遍历文件目录以生成路径数组。

当您希望检查文件是否存在时，加载缓存的数组并执行简单的 isset() 检查以进行快速数组索引查找：

if (isset($myCachedArray[$imgpath])) {
    // handle display
}

加载缓存仍然会产生开销，但希望能够小到足以留在记忆中。如果您在页面上检查多个图像，您可能会注意到更显着的收益，因为您可以在页面加载时加载缓存。

I'm not even sure if this will be any faster but it appears as though you would still like to benchmark soooo:

Build a cache of a large array of all image paths.

$array = array('/path/to/file.jpg' => true, '/path/to/file2.gif' => true);

Update the cache hourly or daily depending on your requirements. You would do this utilizing cron to run a PHP script which will recursively go through the files directory to generate the array of paths.

When you wish to check if a file exists, load your cached array and do a simply isset() check for a fast array index lookup:

if (isset($myCachedArray[$imgpath])) {
    // handle display
}

There will still be overhead from loading the cache but it will hopefully be small enough to stay in memory. If you have multiple images you are checking for on a page you will probably notice more significant gains as you can load the cache on page load.

回复收藏 0 原文

~没有更多了~

关于作者

暂无简介

文章

评论

25 人气

关注发私信

相关话题

热门标签

操作系统程序设计 IT运维 Linux系统管理 JavaScript 服务器应用 solaris C/C++ PHP Shell BSD Vue.js aix Oracle Python HTML 系统管理 HTML5 CSS 前端

推荐作者

知足的幸福

文章 0 评论 0

我一向站在原地

文章 0 评论 0

慕烟庭风

文章 0 评论 0

秉忠贞之诚守退让之实

文章 0 评论 0

小兔几

文章 0 评论 0

mb_3y7WUgWY

文章 0 评论 0

友情链接

我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的隐私政策了解更多相关信息。单击 接受 或继续使用网站，即表示您同意使用 Cookies 和您的相关数据。

原文