如何提高此 MFC 代码的性能?
我进行文件搜索,并且有目录的例外列表,问题是下面的代码递归地迭代硬盘驱动器上的所有文件。它有效,但速度很慢。因此,我需要帮助来优化其性能。提前致谢。
CFileFind finder;
// build a string with wildcards
CString strWildcard(directory);
strWildcard += _T("\\*.*");
// start working for files
BOOL bWorking = finder.FindFile(strWildcard);
while (bWorking)
{
bWorking = finder.FindNextFile();
if (finder.IsDots())
continue;
// if it's a directory, recursively search it
if (finder.IsDirectory())
{
CString str = finder.GetFilePath();
if(NULL == m_searchExceptions.Find(str)){
_recursiveSearch(str);
}
else{
continue;
}
}
//basic comparison, can be replaced by strategy pattern if complicated comparsion required (e.g. REGEX)
if(0 == finder.GetFileName().CompareNoCase(m_searchPattern)){
if(m_currentSearchResults.Find(finder.GetFilePath()) == NULL){
m_currentSearchResults.AddHead(finder.GetFilePath());
}
}
}
I conduct a file search and there is exception list for directories, the problem is below code recursively iterates through all files on hard drives. It works but it is slow. Therefore, I need help to optimize its performance. Thanks in advance.
CFileFind finder;
// build a string with wildcards
CString strWildcard(directory);
strWildcard += _T("\\*.*");
// start working for files
BOOL bWorking = finder.FindFile(strWildcard);
while (bWorking)
{
bWorking = finder.FindNextFile();
if (finder.IsDots())
continue;
// if it's a directory, recursively search it
if (finder.IsDirectory())
{
CString str = finder.GetFilePath();
if(NULL == m_searchExceptions.Find(str)){
_recursiveSearch(str);
}
else{
continue;
}
}
//basic comparison, can be replaced by strategy pattern if complicated comparsion required (e.g. REGEX)
if(0 == finder.GetFileName().CompareNoCase(m_searchPattern)){
if(m_currentSearchResults.Find(finder.GetFilePath()) == NULL){
m_currentSearchResults.AddHead(finder.GetFilePath());
}
}
}
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
看起来你的
m_currentSearchResults
是一个列表,每次你找到一个文件名时,你都会查找它是否已经在列表中。如果您有大量找到的文件(例如数百个),这可能会成为瓶颈,因为它具有O(N^2)
复杂性。如果是这种情况,请考虑使用CMap
,因为它可以为您提供O(log N)
搜索(集合比地图更合适,但您不需要MFC 中没有此功能,但您也可以使用标准库的std::set
代替)。Looks like your
m_currentSearchResults
is a list, and each time you find a file name you look it up if it is already in the list. In the case when you have lots of found files (say hundreds), this can become a bottleneck as it hasO(N^2)
complexity. If this is the case, consider using aCMap
instead as it gives youO(log N)
search (a set would be even more appropriate than a map, but you don't have this in MFC but you could also use the standard library'sstd::set
instead).有多慢?你有简介吗?如果您在硬盘上递归地搜索文件,那么您很可能受到 I/O 限制,除了获得更快的存储硬件(例如固态硬盘)之外,您别无选择。
How slow? Did you profile it? If you're recursively searching files on your hard disk it's extremely likely you're I/O bound and there's nothing you can do short of getting faster storage hardware (like solid state).
我认为您无法在这里优化性能。无论您在优化方面做了什么,您都会将 80% 以上的时间花在
FindFirstFile
和FindNextFile
上(Windows API 调用)。我已经问过类似的问题但尚未得到答复回答。
I don't think you're going to be able to optimize performance here. You're going to be spending 80+% of your time inside
FindFirstFile
andFindNextFile
here (windows API calls) no matter what you do in terms of optimization on your end.I asked a similar question already and have yet to get an answer.
您正在对文件进行一般搜索。有一百万个产品在这方面做得很好,并且它们都使用索引作为优化。这里的薄弱环节肯定是你的磁盘,而不是你的代码。与枚举磁盘上 1,000,000 个文件所需的时间相比,比较 1,000,000 个字符串根本不需要时间。
You're doing a general search for a file. There are a million products out there that do this well, and they all use indexing as an optimization. The weak link here is certainly your disk, not your code. Comparing 1,000,000 strings will take no time at all compared to the time it takes to enumerate 1,000,000 files on disk.
这里有两个关于性能的基本问题:硬盘驱动器访问和目录遍历。您也许都可以对其进行优化。
硬盘驱动器优化
处于静止状态的硬盘往往会保持静止状态。旋转的圆柱体喜欢一直旋转。由此可见,硬盘访问的瓶颈在于启动时间、寻道时间和读取时间。减少访问量并增加每次读取的数据量将提高性能。
内存访问比硬盘驱动器访问更快。因此,将大块数据拖入内存,然后搜索内存。
优化目录搜索。
如果你愿意的话,想象一下一棵“页面”树。树中的每个节点都是一个由零个或多个目录或文件组成的目录。不幸的是,在大多数操作系统中,该数据结构并未针对高效搜索进行优化。
理想的情况是将所有相关目录拖入内存,然后(在内存中)搜索它们。一旦知道文件的位置,对该文件的随机访问就会相对较快。问题是通过仅读取相关目录来减少搜索时间;即减少不相关目录读取的次数。
大多数在硬盘驱动器上执行文件搜索的应用程序都会读取驱动器并创建自己的优化数据结构。对于具有大量文件或文件搜索很少的情况的大型硬盘驱动器来说,这可能不是最佳选择。
如果可以的话,告诉操作系统在内存中保留尽可能多的目录。
提高性能:减少其他应用程序。
对于某些应用程序,感知的性能时间取决于同时运行的其他应用程序。同时运行编译器和互联网搜索会减慢大多数其他应用程序的速度。因此,请尝试消除不需要与您的应用程序同时运行的其他应用程序。此外,投资还可以提高您的申请的优先级。
There are two fundamental issues on performance here: hard drive access and directory traversal. Both you may be able to optimize on.
Hard Drive Optimization
A hard drive at rest tends to stay at rest. A spinning cylinder likes to keep spinning. Thus said, the bottlenecks in hard drive accessing are starting it up, seek time and read time. Reducing the quantity of accesses and increasing the quantity of data per read will increase your performance.
Memory access is faster than hard drive access. So haul large chunks of data into memory, then search memory.
Optimizing Directory Search.
Imagine, if you would, a tree of "pages". Each node in the tree is a directory of zero or more directories or files. Unfortunately, in most OS's, this data structure is not optimized for efficient searching.
The ideal situation is to haul in all the relevant directories into memory then search them (in memory). Once the location of the file is known, random access to the file is relatively quick. The problem is reducing search time by only reading the relevant directories; i.e. reducing the number of irrelevant directory reads.
Most applications that perform file searching on a hard drive read the drive and create their own optimized data structure(s). This may not be optimal for huge hard drives with enormouse quantities of files or cases of few file searches.
If you can, tell the OS to keep as many directories in memory as possible.
Improving Performance: Reducing other applications.
For some applications, the perceived performance time depends on other applications that are running at the same time. Running a compiler and an internet search concurrently will slow down most other applications. So try eliminating other applications that are not necessary from running concurrently with yours. Also, investing rasing the priority of your application.
首先确定+1。另外,这似乎也可以使用任务并行库来解决- 当您看到每个目录时启动任务,并使用 CPU 上的所有这些核心 -
+1 for profile it first to be sure. Also, this seems like a problem that could also be solved using the Task Parallel Library - launch a task as you see each directory, and use all those cores on your CPU -