规划数千个文件的文件夹结构

发布于 2024-12-03 22:05:32 字数 642 浏览 0 评论 0原文

问题:深层文件夹结构更好还是包含数千个文件的子文件夹更少?

问题: 我有一个 VB.NET 程序,每年生成大约 2500 个 XML 文件(每个文件大约 100 KB)。 我必须将文件存储在文件服务器(Windows 7 或 NAS)上。 网络上大约有 30 台 PC 使用该程序。

我正在寻找规划文件服务器上文件夹结构的最佳方法,目标是拥有良好的人类可读文件夹结构,同时快速访问文件。

过去我制作了一个类似的程序,其结构如下:

\fileserver\PC1\year\months\file00001.xml

换句话说,局域网上每台电脑的文件夹 然后是多年来的子文件夹 然后是月份的子文件夹 月份文件夹中存放当月生成的文件 (当然文件名有特殊的印记)

这样我每个月就得到了近200个文件。 该程序运行多年没有出现任何问题。

但现在我想删除子文件夹“MONTH”,以便将 PC 在当年生成的所有文件放在子文件夹年份中,因为

\fileserver\PC1\year\file00001.xml

此解决方案将生成一个更清晰的文件夹树,但每个文件夹有更多文件。 我不知道这是否会成为使用 vb.net 程序或其他第三方应用程序访问文件的速度方面的问题。

您会选择哪种文件夹结构?

感谢您的回复。

The question: better a deep folder structure or less subfolder with thousands files?

The problem:
I have a VB.NET program that generates around 2500 XML files per year (circa 100 KB per file).
I have to store the files on a file server (Windows 7 or NAS).
On the network there are around 30 PCs using that program.

I am looking for the best way to plan the structure of the folders on the file server with the goal to have a good human-readable folders structure and at the same time a fast access to the file.

In the past I made a similar program with the following structure:

\fileserver\PC1\year\months\file00001.xml

in other words a folder for each PC on the LAN
then a subfolder for the years
then a subfolder for the months
and in the month-folder the files generated in the current month
(of course the filename has a special stamp)

in this way I got nearly 200 files per months.
This program run since years without problem.

But now I would like to remove the subfolder "MONTH" in order to have all the files generated by PC in the current year together in the subfolder year, as

\fileserver\PC1\year\file00001.xml

this solution would produce a clearer folder tree, but more files per folder.
I do not know if this could be an issue in term of speed by file accessing with vb.net programs or other third hand application.

Which folder structure would you choose?

Thanks for replying.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

羁拥 2024-12-10 22:05:32

如果您使用 NTFS,那么测量表明平面结构比处理子目录工作得更快,但差异很小(可能 1% 甚至更少,我现在手头没有数字)。

更新:对于一个(单个)文件访问,涉及的搜索较少,子目录提供更好的性能。但是,如果您可以随机访问文件,那么随着时间的推移,将会访问越来越多的文件,操作系统将必须扫描所有目录并将它们加载到内存中。当处理大量文件时,子目录往往会变得更慢。此外,在具有文件名索引的 NTFS 上,打开特定文件的速度相当快,并且遍历子目录可能比从同一文件夹打开文件更慢。

总结一下:速度很大程度上取决于使用场景。我还相信,将文件分组到子目录中会带来显着的好处,直到我进行测试。 NTFS 在处理一个文件夹中数十万个文件时的性能比人们预期的要好得多。因此,我建议您在特定的使用场景中进行自己的测试。

If you use NTFS, then measurements show that flat structure will work faster than dealing with subdirectories, but the difference is minimal (maybe 1% or even less, I don't have numbers at hand now).

Update: For one (single) file access less searches are involved and subdirectories offer better performance. But if you have random access to your files, then over time more and more files will be accessed and the OS will have to scan all directories and load them to memory. When it comes to processing large number of files, subdirectories tend to become slower. Also on NTFS, which has an index of file names, opening particular file is quite fast, and walking through subdirectories can be even slower, than opening the file from the same folder.

To summarize: speed significantly depends on usage scenario. I also believed, that grouping files into subdirectories would bring significant benefits, until I did tests. NTFS performed much better on hundreds of thousands of files in one folder, than one would expect. So I'd recommend making your own tests in your particular usage scenario.

狂之美人 2024-12-10 22:05:32

跟进我接受的答案,我做了一些测试为了找到我自己问题的答案,

我创建了一个包含 3000 个文件的文件夹,它模拟了平面结构。然后我创建了一个文件夹,分为12个子文件夹,每个子文件夹有250个文件,它们模拟了深层树结构。

然后我用 vb6 编写了一个简单的代码,从每个文件夹中读取 100 个文件并将二进制数据复制到数组中。文件名是随机创建的。我重复了 10 次循环并计算了平均时间。

这是平面文件夹的代码。

dtTot = 0
For j = 1 To 10

   dtStart = GetTickCount

   For i = 1 To 100
     iFileNum = FreeFile
     iNr = Int(2999 * Rnd + 1)
     sFilename = sROOT & "2010\" & "raw (" & CStr(iNr) & ").dat"

     iNCount = (FileLen(sFilename) / 4
     ReDim lVetRawData(iNCount)

     Open sFilename For Binary Access Read As #iFileNum
     Get #iFileNum, , lVetRawData
     Close iFileNum

   Next i

 dtEnd = GetTickCount
 dtTot = dtTot + dtEnd - dtStart

Next j

我得到以下结果:

NTFS 上的深层文件夹 162,5 毫秒

NTFS 196 上的平面文件夹,9 毫秒

NAS 280 上的深层文件夹,2 毫秒

NAS 340 上的平面文件夹,7 毫秒

其中 NTFS 服务器是 Windows 2003 Pentium 计算机,并且NAS是Synology DS210j(基于linux)

我在不同的网络条件下重复测试,得到了几乎相似的值。

我希望我没有犯任何逻辑错误,这不是一个准确的测量,但测试准确地再现了我必须对我的代码执行的访问类型:在所有情况下,深层文件夹结构在我的测试环境中似乎更快。

Following up on the answer I accepted, I made some test in order to find an answer to my own question

I created a folder with 3000 files, it simulates the flat structure. Then I created a folder divided in 12 subfolders, each one with 250 files, they simulated the deep tree structure.

Then I wrote in vb6 a simple code to read 100 files from each folders and copy the binary data in an array. The file name was created randomly. I repeated 10 times the loop and calculated the average time.

Here the code for the flat folder.

dtTot = 0
For j = 1 To 10

   dtStart = GetTickCount

   For i = 1 To 100
     iFileNum = FreeFile
     iNr = Int(2999 * Rnd + 1)
     sFilename = sROOT & "2010\" & "raw (" & CStr(iNr) & ").dat"

     iNCount = (FileLen(sFilename) / 4
     ReDim lVetRawData(iNCount)

     Open sFilename For Binary Access Read As #iFileNum
     Get #iFileNum, , lVetRawData
     Close iFileNum

   Next i

 dtEnd = GetTickCount
 dtTot = dtTot + dtEnd - dtStart

Next j

I get the following result:

deep folder on NTFS 162,5 ms

flat folder on NTFS 196,9 ms

deep folder on NAS 280,2 ms

flat folder on NAS 340,7 ms

where the NTFS server is a Windows 2003 Pentium machine and the NAS is a Synology DS210j (linux based)

I repeated the test in different network conditions and got nearly similar value.

I hope I did not make any logical mistake and this is not an accurate measurament, but the test reproduces exactly the kind of access I have to do with my code: in all cases the deep folder structure seem to be faster one in my test environment.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文