磁盘在共享 unix 服务器上读取/查找目录列表

发布于 2024-09-11 17:41:02 字数 350 浏览 11 评论 0原文

我想更好地了解简单 ls 命令和特定文件夹上的 cat * 命令的磁盘读取工作原理。

据我了解,磁盘读取对于服务器/任何机器来说是“最慢”的操作,并且我想到的网络应用程序将非常频繁地对某个文件夹进行 ls 和 cat * 调用。

对于以下条目数,“ls”和“cat *”所涉及的磁盘读取的“大概”估计是多少?

                Disk reads for ls                     Disk reads for cat *
200
2,000
20,000
200,000

每个文件条目只是一行文本

I want to get a better understanding of how disk reads work for a simple ls command and for a cat * command on a particular folder.

As I understand it, disk reads are the "slowest" operation for a server/any machine, and a webapp I have in mind will be making ls and cat * calls on a certain folder very frequently.

What are "ball park" estimates of the disk reads involved for an "ls" and for a "cat *" for the following number of entries?

                Disk reads for ls                     Disk reads for cat *
200
2,000
20,000
200,000

Each file entry is just a single line of text

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

入画浅相思 2024-09-18 17:41:02

很难回答——这可能就是为什么花了这么长时间才得到任何答案。

在某种程度上,答案将取决于文件系统——不同的文件系统会给出不同的答案。但是,执行 'ls' 需要读取保存目录条目的页面,并读取保存目录中标识的 inode 的页面。有多少页 - 以及因此有多少磁盘读取 - 取决于页面大小和目录大小。如果您考虑每个文件名 6-8 个字节的开销,那么您的差距不会太大。如果每个名称大约有 12 个字符,那么每个文件大约有 20 个字节,如果您的页面有 4096 字节 (4KB),那么每个目录页面大约有 200 个文件。

如果您仅使用“ls”列出名称而不列出其他属性,那么您就完成了。如果您列出属性(大小等),则也必须读取索引节点。我不确定现代索引节点有多大。几十年前,在原始文件系统上,每个文件都是 64 字节;从那时起它可能已经增长了。每页会有多个 inode,但您无法确定所需的 inode 是连续的(在磁盘上彼此相邻)。在最坏的情况下,您可能需要为每个单独的文件读取另一页,但这在实践中几乎不可能。幸运的是,内核在缓存磁盘页面方面非常出色,因此不太可能需要重新读取页面。我们无法很好地猜测相关 inode 条目的密度;每页可能有 4 个 inode,但从 1 到 64 的任何估计都可能是合理的。因此,对于包含 200 个文件的目录,您可能需要读取 50 页。

当对文件运行“cat”时,系统必须找到每个文件的索引节点,就像“ls”一样;然后它必须读取文件的数据。除非数据存储在 inode 本身中(我认为在某些具有较大 inode 和足够小的文件体的文件系统中这是可能的),那么您必须读取每个文件一页 - 除非小文件的部分页面聚集在一起在一页上(我似乎记得听说过在某些文件系统中可能会发生这种情况)。

因此,对于 200 个文件目录:

  • 普通 ls: 1 页
  • ls -l: 51 页
  • cat *: 251 页

我不确定我非常相信这些数字 - 但你可以看到改进估计所需的数据类型。

Tricky to answer - which is probably why it spent so long getting any answer at all.

In part, the answer will depend on the file system - different file systems will give different answers. However, doing 'ls' requires reading the pages that hold the directory entries, plus reading the pages that hold the inodes identified in the directory. How many pages that is - and therefore how many disk reads - depends on the page size and on the directory size. If you think in terms of 6-8 bytes of overhead per file name, you won't be too far off. If the names are about 12 characters each, then you have about 20 bytes per file, and if your pages are 4096 bytes (4KB), then you have about 200 files per directory page.

If you just list names and not other attributes with 'ls', you are done. If you list attributes (size, etc), then the inodes have to be read too. I'm not sure how big a modern inode is. Once upon a couple of decades ago on a primitive file system, it was 64-bytes each; it might have grown since then. There will be a number of inodes per page, but you can't be sure that the inodes you need are contiguous (adjacent to each other on disk). In the worst case, you might need to read another page for each separate file, but that is pretty unlikely in practice. Fortunately, the kernel is pretty good about caching disk pages, so it is unlikely to have to reread a page. It is impossible for us to make a good guess on the density of the relevant inode entries; it might be, perhaps, 4 inodes per page, but any estimate from 1 to 64 might be plausible. Hence, you might have to read 50 pages for a directory containing 200 files.

When it comes to running 'cat' on the files, the system has to locate the inode for each file, just as with 'ls'; it then has to read the data for the file. Unless the data is stored in the inode itself (I think that is/was possible in some file systems with biggish inodes and small enough file bodies), then you have to read one page per file - unless partial pages for small files are bunched together on one page (again, I seem to remember hearing that could happen in some file systems).

So, for a 200 file directory:

  • Plain ls: 1 page
  • ls -l: 51 pages
  • cat *: 251 pages

I'm not sure I'd trust the numbers very far - but you can see the sort of data that is necessary to improve the estimates.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文