PHP 中的 file_exists() 太慢了。谁能建议一个更快的替代方案?
在我们的网站上显示图像时,我们会通过调用 file_exists()
检查文件是否存在。如果文件丢失,我们会退回到虚拟图像。
然而,分析表明,这是生成页面最慢的部分,每个文件 file_exists()
最多需要 1/2 毫秒。我们只测试了 40 个左右的文件,但这仍然会导致页面加载时间增加 20 毫秒。
任何人都可以建议一种方法来加快速度吗?是否有更好的方法来测试文件是否存在?如果我构建某种缓存,我应该如何保持同步。
When displaying images on our website, we check if the file exists with a call to file_exists()
. We fall back to a dummy image if the file was missing.
However, profiling has shown that this is the slowest part of generating our pages with file_exists()
taking up to 1/2 ms per file. We are only testing 40 or so files, but this still pushes 20ms onto the page load time.
Can anyone suggest a way of making this go faster? Is there a better way of testing if the file is present? If I build a cache of some kind, how should I keep it in sync.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(20)
file_exists()
应该是一个非常便宜的操作。另请注意,file_exists
构建自己的缓存以帮助提高性能。请参阅: http://php.net/manual/en/function.file-exists .php
file_exists()
should be a very inexpensive operation. Note too thatfile_exists
builds its own cache to help with performance.See: http://php.net/manual/en/function.file-exists.php
使用绝对路径!根据您的
include_path
设置,如果您检查相对文件路径,PHP 会检查所有(!)这些目录!在检查是否存在之前,您可以暂时取消设置include_path
。realpath()
的作用相同,但我不知道它是否更快。但文件访问 I/O 总是很慢。通常,硬盘访问比比处理器中的计算速度慢。
Use absolute paths! Depending on your
include_path
setting PHP checks all(!) these dirs if you check relative file paths! You might unsetinclude_path
temporarily before checking the existence.realpath()
does the same but I don't know if it is faster.But file access I/O is always slow. A hard disk access IS slower than calculating something in the processor, normally.
检查本地文件是否存在的最快方法是 stream_resolve_include_path():
性能结果stream_resolve_include_path() vs file_exists():
在测试中使用绝对路径。
测试源位于此处。
PHP版本:
The fastest way to check existence of a local file is stream_resolve_include_path():
Performance results stream_resolve_include_path() vs file_exists():
In test used absolute paths.
Test source is here.
PHP version:
如果您只是想回退到此虚拟图像,您可能需要考虑让客户端通过重定向(到虚拟图像)与服务器进行协商文件未找到。
这样,您只会有一点重定向开销,并且客户端的延迟不会明显。至少您将摆脱对
file_exists
的“昂贵”调用(我知道事实并非如此)。只是一个想法。
If you're just interested in falling back to this dummy image, you might want to consider letting the client negotiate with the server by means of a redirect (to the dummy image) on file-not-found.
That way you'll just have a little redirection overhead and a not-noticeable delay on the client side. At least you'll get rid of the "expensive" (which it isn't, I know) call to
file_exists
.Just a thought.
PHP 5.6 基准测试:
现有文件:
无效文件:
无效文件夹:
代码:
命令行执行:
Benchmarks with PHP 5.6:
Existing File:
Invalid File:
Invalid Folder:
Code:
Command line Execution:
创建一个哈希例程,将文件分片到多个子目录中。
另外,您可以使用 mod_rewrite 返回您的占位符图像,以请求 404 到您的图像目录。
Create a hashing routine for sharding the files into multiple sub-directories.
Also, you could use mod_rewrite to return your placeholder image for requests to your image directory that 404.
file_exists()
由 PHP 自动缓存。我认为您不会在 PHP 中找到更快的函数来检查文件是否存在。请参阅此帖子。
file_exists()
is automatically cached by PHP. I don't think you'll find a faster function in PHP to check the existence of a file.See this thread.
我不完全知道你想做什么,但你可以让客户端处理它。
I don't exactly know what you want to do, but you could just let the client handle it.
如果您想检查图像文件是否存在,更快的方法是使用getimagesize!
本地和远程速度更快!
If you want to check existence of an image file, a much faster way is to use getimagesize !
Faster locally and remotely!
老问题,我要在这里添加一个答案。对于 php 5.3.8,is_file()(对于现有文件)速度要快一个数量级。对于不存在的文件,时间几乎相同。对于带有 eaccelerator 的 PHP 5.1,它们更接近一些。
PHP 5.3.8 w &不
带 eaccelerator 的 APC PHP 5.1
有一些注意事项。
1) 并非所有“文件”都是文件, is_file() 测试的是常规文件,而不是符号链接。因此,在 *nix 系统上,除非您确定您只处理常规文件,否则您无法仅使用 is_file()。对于上传等,这可能是一个合理的假设,或者如果服务器是基于 Windows 的,它实际上没有符号链接。否则,您必须测试 is_file($file) || is_link($file)。
2) 如果文件丢失并且变得大致相等,那么所有方法的性能肯定会降低。
3)最大的警告。所有方法都会缓存文件统计信息以加快查找速度,因此如果文件定期或快速更改、删除、重新出现、删除,则必须运行
clearstatcache();
以确保正确的文件存在信息在缓存中。所以我测试了这些。我省略了所有文件名等。重要的是几乎所有时间都收敛,除了stream_resolve_include,它的速度是原来的4倍。同样,该服务器上有 eaccelerator,所以 YMMV。基本上,这个想法是,如果您 100% 确定它是一个文件,而不是符号链接或目录,并且很可能它会存在,那么使用
is_file()
。你会看到一定的收获。如果文件在任何时刻都可以是文件或符号链接,则失败的 is_file() 14x + is_link() 14x (is_file() || is_link()
),最终将是 2x整体较慢。如果文件的存在发生很大变化,则使用stream_resolve_include_path()。所以这取决于你的使用场景。
Old question, I'm going to add an answer here. For php 5.3.8, is_file() (for an existing file) is an order of magnitude faster. For a non-existing file, the times are nearly identical. For PHP 5.1 with eaccelerator, they are a little closer.
PHP 5.3.8 w & w/o APC
PHP 5.1 w/ eaccelerator
There are a couple of caveats.
1) Not all "files" are files, is_file() tests for regular files, not symlinks. So on a *nix system, you can't get away with just is_file() unless you are sure that you are only dealing with regular files. For uploads, etc, this may be a fair assumption, or if the server is Windows based, which does not actually have symlinks. Otherwise, you'll have to test
is_file($file) || is_link($file)
.2) Performance definitely degrades for all methods if the file is missing and becomes roughly equal.
3) Biggest caveat. All the methods cache the file statistics to speed lookup, so if the file is changing regularly or quickly, deleted, reappears, deletes, then
clearstatcache();
has to be run to insure that the correct file existence information is in the cache. So I tested those. I left out all the filenames and such. The important thing is that almost all the times converge, except stream_resolve_include, which is 4x as fast. Again, this server has eaccelerator on it, so YMMV.Basically, the idea is, if you're 100% sure that it is a file, not a symlink or a directory, and in all probability, it will exist, then use
is_file()
. You'll see a definite gain. If the file could be a file or a symlink at any moment, then the failed is_file() 14x + is_link() 14x (is_file() || is_link()
), and will end up being 2x slower overall. If the file's existence changes A LOT, then use stream_resolve_include_path().So it depends on your usage scenario.
如果您仅检查现有
文件
,请使用is_file()
。file_exists()
检查现有文件或目录,因此is_file()
可能会更快一些。If you are only checking for existing
files
, useis_file()
.file_exists()
checks for a existing file OR directory, so maybeis_file()
could be a little faster.2021 年,也就是问题提出 12 年后,我遇到了同样的用例。我对这里的答案不满意并做了一个实验。我循环检查带有
file_exist
的文件夹中的图像中是否存在大约 40 张图像中的一张。以下是以毫秒为单位的数字(PHP 7.4):
服务器比开发机器快 10 倍,并且与整体 UX 性能 POV 没有什么区别,其中 30-50 毫秒是第一个明显的阈值。
在服务器上检查 40 张图像的数组时,我花了 0.4 毫秒来检查其中是否有不存在的图像。顺便说一句,无论某些图像是否存在,性能都没有差异。
因此,由于磁盘性能的原因,是否检查
file_exist
应该是没有问题的。检查一下是否需要。In 2021, 12 years later since the question was asked I have the same use case. I was not satisfied with the answers here and made an experiment. I check in a loop if one of around 40 images exists among the images in a folder with
file_exist
.Here are the figures (PHP 7.4) in milliseconds:
The server is 10 times faster than the dev machine, and quite indistinguishable from overall UX performance POV where 30-50 ms is somewhat first noticeable threshold.
On server checking the array of 40 images I spend 0.4 ms to check if any of them not-existent. BTW no difference in performance whether some of the images exist or not.
So this should be of no question whether to check with
file_exist
or not because of disk performance. Check if you need.它们都在同一个目录中吗?如果是这样,可能值得获取文件列表并将它们存储在散列中并与该列表进行比较,而不是所有 file_exists 查找。
Are they all in the same directory? If so it may be worth getting the list of files and storing them in a hash and comparing against that rather than all the file_exists lookups.
我发现每次调用 1/2ms 非常非常实惠。我认为没有更快的替代方案,因为文件函数非常接近处理文件操作的较低层。
但是,您可以为 file_exists() 编写一个包装器,将结果缓存到内存缓存或类似的设施中。这应该可以将日常使用的时间减少到几乎没有时间。
I find 1/2ms per call very, very affordable. I don't think there are much faster alternatives around, as the file functions are very close to the lower layers that handle file operations.
You could however write a wrapper to file_exists() that caches results into a memcache or similar facility. That should reduce the time to next to nothing in everyday use.
您可以执行一个 cronjob 定期创建图像列表并将它们存储在 DB/file/BDB/...
每半小时应该没问题,但请务必创建一个接口来重置缓存,以防文件添加/删除。
然后,运行 find 也很容易。 -mmin -30 -print0 在 shell 上并添加新文件。
You could do a cronjob to periodically create a list of images and store them in DB/file/BDB/...
Every half an hour should be fine, but be sure to create an interface to reset cache in case of file addition/delete.
And then, it's also easy to run find . -mmin -30 -print0 on the shell and add new files.
当您将文件保存到文件夹时,如果上传成功,您可以将路径存储到数据库表中。
然后,您只需对数据库进行查询即可找到所请求文件的路径。
When you save a file to a folder, if the upload was successfully, you can store the path to a DB Table.
Then you will just have to make a query to the database in order to find the path of the requested file.
我来到这个页面寻找解决方案,似乎 fopen 可以解决这个问题。如果您使用此代码,您可能希望禁用未找到的文件的错误日志记录。
I came to this page looking for a solution, and it seems fopen may do the trick. If you use this code, you might want to disable error logging for the files that are not found.
我认为最好的方法是将图像 url 保存在数据库中,然后将其放入会话变量中,尤其是当您进行身份验证时。这样您就不必在每次重新加载页面时进行检查
I think the best way is to keep the image url in the database and then put it in a session variable especially when you have authentication. These way you dont have to be checking each time a page reloads
那
glob()
呢?但我不确定它是否很快。http://www.php.net/manual/en/function.glob。 php
What about
glob()
? But I'm not sure if it's fast.http://www.php.net/manual/en/function.glob.php
我什至不确定这是否会更快,但看起来您仍然想进行基准测试:
构建所有图像路径的大型数组的缓存。
更新缓存每小时或每天取决于您的要求。您可以利用 cron 运行 PHP 脚本来执行此操作,该脚本将递归地遍历文件目录以生成路径数组。
当您希望检查文件是否存在时,加载缓存的数组并执行简单的 isset() 检查以进行快速数组索引查找:
加载缓存仍然会产生开销,但希望能够小到足以留在记忆中。如果您在页面上检查多个图像,您可能会注意到更显着的收益,因为您可以在页面加载时加载缓存。
I'm not even sure if this will be any faster but it appears as though you would still like to benchmark soooo:
Build a cache of a large array of all image paths.
Update the cache hourly or daily depending on your requirements. You would do this utilizing cron to run a PHP script which will recursively go through the files directory to generate the array of paths.
When you wish to check if a file exists, load your cached array and do a simply isset() check for a fast array index lookup:
There will still be overhead from loading the cache but it will hopefully be small enough to stay in memory. If you have multiple images you are checking for on a page you will probably notice more significant gains as you can load the cache on page load.