使用 stat() 获取文件类型的替代方法?
有没有 stat(大多数 Unix 系统上都有)的替代方法可以确定文件类型?手册页说调用 stat 的成本很高,我需要在我的应用程序中经常调用它。
Are there any alternatives to stat (which is found on most Unix systems) which can determine the file type? The manpage says that a call to stat is expensive, and I need to call it quite often in my app.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
如果您已经打开了文件(因此您有一个文件描述符),则替代方法是 fstat() 。或者如果您想查找符号链接而不是符号链接指向的文件,则可以使用lstat()。
我认为手册页夸大了成本;它并不比任何其他必须将文件名解析为 inode 的系统调用差多少。它比 getpid() 成本更高;它比
open()
成本更低。The alternative is
fstat()
if you already have the file open (so you have a file descriptor for it). Orlstat()
if you want to find out about symbolic links rather than the file the symlink points to.I think the man page is exaggerating the cost; it is not much worse than any other system call that has to resolve the name of the file into an inode. It is more costly than
getpid()
; it is less costly thanopen()
.stat()
为您提供的“文件类型”是文件是常规文件还是设备文件或目录之类的文件,以及其大小和索引节点号等信息。如果这就是您需要知道的,那么您必须使用stat()
。如果您实际上需要知道的是文件内容的类型(例如文本文件、JPEG 图像、MP3 音频),那么您有两个选择。您可以根据文件扩展名进行猜测(如果以“.mp3”结尾,则该文件可能包含 MP3 音频),或者您可以使用 libmagic,它实际上打开文件并读取其中的一些内容以找出它是什么。 libmagic 方法成本更高(如果您试图避免
stat()
,您可能也想避免open()
),但不太容易出错(在例如,“.mp3”文件实际上是 JPEG 图像)。The "file type" that
stat()
gives you is whether the file is a regular file or something like a device file or directory, among other things like its size and inode number. If that's what you need to know, then you must usestat()
.If what you actually need to know is the type of the file's contents -- e.g. text file, JPEG image, MP3 audio -- then you have two options. You can guess based on the filename extension (if it ends in ".mp3", the file probably contains MP3 audio), or you can use libmagic, which actually opens the file and reads some of its contents to figure out what it is. The libmagic approach is more expensive (if you're trying to avoid
stat()
, you probably want to avoidopen()
too), but less prone to error (in case that ".mp3" file is actually a JPEG image, for example).在具有某些文件系统的 Linux 下,文件类型(常规、字符设备、块设备、目录、管道、符号链接等)存储在 linux_dirent 结构中,这是内核通过 getdents 系统调用提供应用程序目录条目的内容。如果 stat 结构中您需要的唯一内容是文件类型,并且您需要获取目录的所有或多个条目的文件类型,则可以直接使用 getdents (而不是 readdir)并尝试从中获取文件类型,仅当您在 linux_dirent 中发现无效文件类型时才使用 stat。根据应用程序的文件系统使用模式,如果您使用 Linux,这可能比使用 stat 更快,但在许多情况下 stat 应该很快。
Stat 的速度主要与在磁盘上查找所需数据有关。如果您以递归方式遍历目录来统计所有文件,那么每个统计数据总体上应该会相当快,因为获取数据统计数据所需的大部分工作在您通过先前调用 stat 向内核请求之前就已被缓存。另一方面,如果您对系统中随机分布的相同数量的文件进行 stat 统计,那么内核可能必须从磁盘读取您要调用 stat 的每个文件的多个目录。
fstat 应该总是非常快,因为内核应该已经在 RAM 中拥有您要求的数据,因为它需要访问它才能使文件处于打开状态,并且内核不必经历麻烦遍历文件名的路径以查看每个组件是在 RAM 中还是在磁盘上,并且可能从磁盘读取目录(但可能不必),结果发现它在 RAM 中具有您要求的数据。
话虽这么说,在打开的文件上调用 stat 应该比在未打开的文件上调用它更快。
Under Linux with some filesystems the file type (regular, char device, block device, directory, pipe, sym link, ...) is stored in the linux_dirent struct, which is what the kernel supplies applications directory entries in via the getdents system call. If the only thing in the stat structure you needed was the file type and you needed to get that for all or many entries of a directory, you could use getdents directly (rather than readdir) and attempt to get the file type out of that, only using stat if you found an invalid file type in linux_dirent. Depending on the your application's filesystem usage pattern this could be faster than using stat if you are using Linux, but stat should be fast in many cases.
Stat's speed has mostly to do with locating the data that is being asked for on disk. If you are traversing a directory recursively stat-ing all of the files then each stat should end up being fairly quick overall because most of the work getting the data stat needs ends up cached before you ask the kernel for it by a previous call to stat. If on the other hand you stat the same number of files randomly distributed around the system then the kernel will likely have to read from disk several directories for each file you are going to call stat on.
fstat should always be very fast since the kernel should already have the data you're asking for in RAM, as it needs to access it for the file to be in the open state, and the kernel won't have to go through the trouble of traversing the path of the filename to see if each component is in RAM or on disk and possibly reading in a directory from disk (but likely not having to), only to discover that it has the data that you are asking for in RAM.
That being said, calling stat on an open file should be faster than calling it on an unopened file.
您知道 *nix 系统上的“神奇”文件吗?通过使用诸如
file myfile.ext
之类的命令行查询文件,您可以获得真实的文件类型。这是通过读取文件内容而不是查看其扩展名来完成的,并且广泛用于 *nix(Linux、Unix、...)系统。
Are you aware of the "magic" file on *nix systems? By querying a file from the command line with something like
file myfile.ext
you can get the real file type.This is done by reading the contents of the file rather than looking at its extension, and is widely used on *nix (Linux, Unix, ...) systems.
如果您的应用程序预计在 Linux 系统上运行,为什么不尝试 inotify(7)。它绝对比统计许多文件要快。
If your application is expected to run on Linux systems, why don't you try inotify(7). It is definitely faster than
stat
ing many files.