多线程之间的数据访问同步
我正在尝试在 Visual C++ 中实现多线程、递归文件搜索逻辑。逻辑如下: 线程 1,2 将从目录位置开始,并将目录中存在的文件与搜索条件进行匹配。如果他们找到子目录,他们会将其添加到工作队列中。一旦线程完成目录中的文件,它就会从工作队列中获取另一个目录路径。工作队列是一个 STL Stack 类,由 CriticalSections 保护,用于 push()、pop()、top() 调用。
如果堆栈在任何时候为空,线程将在重试之前等待一分钟。此外,当所有线程都处于等待状态时,搜索将被标记为完成。
这种逻辑工作没有任何问题,但我觉得我没有获得使用线程的全部潜力,因为与使用单线程相比,没有显着的性能增益。我觉得工作堆栈是瓶颈,但不知道如何消除锁定部分。我尝试了另一种变体,其中每个线程都有自己的堆栈,并且仅当本地堆栈大小跨越固定数量的工作项时才会将工作项添加到全局堆栈。如果本地堆栈为空,线程将尝试从全局队列中获取。即使有这种变化,我也没有发现明显的差异。有人对改进同步逻辑有任何建议吗?
问候,
I'm trying to implement a multi threaded, recursive file search logic in Visual C++. The logic is as follows:
Threads 1,2 will start at a directory location and match the files present in the directory with the search criteria. If they find a child directory, they will add it to a work Queue. Once a thread finishes with the files in a directory, it grabs another directory path from the work queue. The work queue is a STL Stack class guarded with CriticalSections for push(),pop(),top() calls.
If the stack is empty at any point, the threads will wait for a minute amount of time before retrying. Also when all the threads are in waiting state, the search is marked as complete.
This logic works without any problems but I feel that I'm not gaining the full potential of using threads because there isn't drastic performance gain compared to using single thread. I feel the work Stack is the bottle neck but can't figure out how to do away with the locking part. I tried another variation where each thread will be having its own Stack and will add a work item to the global Stack only when the local stack size crosses a fixed number of work items. If the local Stack is empty, threads will try fetching from global queue. I didn't find noticeable difference even with this variation. Does any one have any suggestions for improving the synchronization logic.
Regards,
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
我真的怀疑你的工作堆栈是瓶颈。磁盘只有一个磁头,一次只能读取一串数据。只要您的线程处理数据的速度与磁盘提供的速度一样快,您就没有什么可以做的,这会对整体速度产生任何重大影响。
对于其他类型的任务,您的队列可能会成为一个重要的瓶颈,但对于此任务,我对此表示怀疑。请记住此处操作的时间范围。 CPU 内部发生的一个简单操作只需不到一纳秒。从主存储器中读取数据大约需要几十纳秒的时间。像线程切换或同步之类的事情需要几百纳秒左右的时间。磁盘驱动器上的单个磁头移动大约需要一毫秒(1,000,000 纳秒)。
I really doubt that your work stack is the bottleneck. The disk only has one head, and can only read one stream of data at a time. As long as your threads are processing the data as fast as the disk can supply it, there's not much else you can do that's going to have any significant effect on overall speed.
For other types of tasks your queue might become a significant bottleneck, but for this task, I doubt it. Keep in mind the time scales of the operations here. A simple operation that happens inside of a CPU takes considerably less than a nanosecond. A read from main memory takes on the order of tens of nanoseconds. Something like a thread switch or synchronization takes on the order of a couple hundred nanoseconds or so. A single head movement on the disk drive takes on the order of a millisecond or so (1,000,000 nanoseconds).
除了@Jerry的答案之外,你的瓶颈是磁盘系统。如果您有 RAID 阵列,您可能会发现使用 2 或 3 线程会带来一些适度的改进。
如果您必须搜索多个驱动器(注意:物理驱动器,而不是单个物理驱动器上的卷),您可以为每个驱动器使用额外的线程。
In addition to @Jerry's answer, your bottleneck is the disk system. If you have a RAID array you might see some moderate improvement from using 2 or 3 threads.
If you have to search multiple drives (note: physical drives, not volumes on a single physical drive) you can use extra threads for each of them.