Windows SetThreadAffinityMask 没有效果
我编写了一个小型测试程序,在其中尝试使用 Windows API 调用 SetThreadAffinityMask 将线程锁定到单个 NUMA 节点。我使用 GetNumaNodeProcessorMask API 调用检索节点的 CPU 位掩码,然后将该位掩码与 GetCurrentThread 返回的线程句柄一起传递给 SetThreadAffinityMask。这是我的代码的一个大大简化的版本:
// Inside a function called from a boost::thread
unsigned long long nodeMask = 0;
GetNumaNodeProcessorMask(1, &nodeMask);
HANDLE thread = GetCurrentThread();
SetThreadAffinityMask(thread, nodeMask);
DoWork(); // make-work function
我当然会检查 API 调用是否在我的代码中返回 0,并且我还打印出了 NUMA 节点掩码,这正是我所期望的。我还遵循了其他地方给出的建议,并打印了第二次相同的 SetThreadAffinityMask 调用返回的掩码,它与节点掩码匹配。
然而,从 DoWork 函数执行时的资源监视器来看,工作被分配给所有核心,而不是仅仅那些表面上绑定的核心。使用 SetThreadAffinityMask 时是否有任何我可能错过的问题?我运行的是 Windows 7 Professional 64 位,DoWork 函数包含一个与 OpenMP 并行的循环,它对三个非常大的数组的元素执行操作(它们组合起来仍然能够适合节点)。
编辑:为了扩展 David Schwartz 给出的答案,在 Windows 上,使用 OpenMP 生成的任何线程都不会继承生成它们的线程的亲和力。问题出在于此,而不是 SetThreadAffinityMask。
I have written a small test program in which I try to use the Windows API call SetThreadAffinityMask to lock the thread to a single NUMA node. I retrieve the CPU bitmask of a node with the GetNumaNodeProcessorMask API call, then pass that bitmask to SetThreadAffinityMask along with the thread handle returned by GetCurrentThread. Here is a greatly simplified version of my code:
// Inside a function called from a boost::thread
unsigned long long nodeMask = 0;
GetNumaNodeProcessorMask(1, &nodeMask);
HANDLE thread = GetCurrentThread();
SetThreadAffinityMask(thread, nodeMask);
DoWork(); // make-work function
I of course check whether the API calls return 0 in my code, and I've also printed out the NUMA node mask and it is exactly what I would expect. I've also followed advice given elsewhere and printed out the mask returned by a second identical call to SetThreadAffinityMask, and it matches the node mask.
However, from watching the resource monitor when the DoWork function executes, the work is split among all cores instead of only those it is ostensibly bound to. Are there any trip-ups I may have missed when using SetThreadAffinityMask? I am running Windows 7 Professional 64-bit, and the DoWork function contains a loop parallelized with OpenMP which performs operations on the elements of three very large arrays (which combined are still able to fit in the node).
Edit: To expand on the answer given by David Schwartz, on Windows any threads spawned with OpenMP do NOT inherit the affinity of the thread which spawned them. The problem lies with that, not SetThreadAffinityMask.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
您是否确认其关联掩码的特定线程正在另一个 numa 节点的核心上运行?否则,它会按预期工作。您正在一个线程上设置处理器掩码,然后观察组线程的行为。
Did you confirm that the particular thread whose affinity mask was running on a core in another numa node? Otherwise, it's working as intended. You are setting the processor mask on one thread and then observing the behavior of a group of threads.