如何在windows上为一个线程保留一个核心?

发布于 2024-10-23 11:32:33 字数 146 浏览 3 评论 0原文

我正在开发一个对时间非常敏感的应用程序,该应用程序会轮询共享内存区域,当检测到发生更改时采取行动。改变很少见,但我需要尽量缩短从改变到采取行动的时间。鉴于更改频率不高,我认为 CPU 缓存正在变冷。有没有办法为我的轮询线程保留一个核心,这样它就不必与其他线程竞争缓存或 CPU?

I am working on a very time sensitive application which polls a region of shared memory taking action when it detects a change has occurred. Changes are rare but I need to minimize the time from change to action. Given the infrequency of changes I think the CPU cache is getting cold. Is there a way to reserve a core for my polling thread so that it does not have to compete with other threads for either cache or CPU?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

兲鉂ぱ嘚淚 2024-10-30 11:32:33

仅线程亲和力 (SetThreadAffinityMask) 是不够的。它不保留CPU 核心,但它的作用相反,将线程仅绑定到您指定的核心(这不是同一件事!)。

通过限制 CPU 亲和性,可以降低线程运行的可能性。如果另一个具有更高优先级的线程在同一核心上运行,则在该另一个线程完成之前不会调度您的线程(这就是 Windows 调度线程的方式)。
在不限制亲和力的情况下,您的线程有机会迁移到另一个核心(以上次运行时间作为该决策的指标)。如果线程迁移经常发生并且在线程运行后(或运行时)不久发生,那么线程迁移是不可取的,但如果自上次调度以来已经过去了几十毫秒(缓存将被覆盖),那么它是无害且有益的事情那么无论如何)。

您可以“某种程度上”确保您的线程将通过赋予其更高的优先级来运行(不能保证,但可能性很大)。如果您随后也使用 SetThreadAffinityMask,则您有合理的机会在大多数常见桌面 CPU(幸运的是通常是 VIPT 和 PIPT)上缓存始终处于热状态。对于 TLB,您可能没那么幸运,但您对此无能为力。

高优先级线程的问题在于,它会使其他线程挨饿,因为实现了调度,因此它首先服务于较高优先级的类,并且只要这些不满足,较低的类就为零。因此,这种情况下的解决方案必须是阻止。否则,您可能会以不利的方式损害系统。

试试这个:

  • 创建一个信号量并与其他进程共享它,
  • 上将优先级设置为 THREAD_PRIORITY_TIME_CRITICAL
  • 在其他进程中的信号量
  • 可以在信号量上调用 SignalObjectAndWait,超时为 1(甚至零超时)
  • 块,写入数据后,如果需要, ,您可以尝试将它们绑定到同一个核心,

这将创建一个线程,该线程将是第一个(或第一个)获得 CPU 时间的线程,但它并未运行。
当编写器线程调用 SignalObjectAndWait 时,它会自动发出信号并阻塞(即使它等待足以重新调度的“零时间”)。另一个线程将从信号量中唤醒并执行其工作。由于其高优先级,它不会被其他“正常”(即非实时)线程中断。它将继续占用 CPU 时间直到完成,然后再次阻塞信号量。此时,SignalObjectAndWait 返回。

Thread affinity alone (SetThreadAffinityMask) will not be enough. It does not reserve a CPU core, but it does the opposite, it binds the thread to only the cores that you specify (that is not the same thing!).

By constraining the CPU affinity, you reduce the likelihood that your thread will run. If another thread with higher priority runs on the same core, your thread will not be scheduled until that other thread is done (this is how Windows schedules threads).
Without constraining affinity, your thread has a chance of being migrated to another core (taking the last time it was run as metric for that decision). Thread migration is undesirable if it happens often and soon after the thread has run (or while it is running) but it is a harmless, beneficial thing if a couple of dozen milliseconds have passed since it was last scheduled (caches will have been overwritten by then anyway).

You can "kind of" assure that your thread will run by giving it a higher priority class (no guarantee, but high likelihood). If you then use SetThreadAffinityMask as well, you have a reasonable chance that the cache is always warm on most common desktop CPUs (which luckily are normally VIPT and PIPT). For the TLB, you will probably be less lucky, but there's nothing you can do about it.

The problem with a high priority thread is that it will starve other threads because scheduling is implemented so it serves higher priority classes first, and as long as these are not satisfied, lower classes get zero. So, the solution in this case must be to block. Otherwise, you may impair the system in an unfavorable way.

Try this:

  • create a semaphore and share it with the other process
  • set priority to THREAD_PRIORITY_TIME_CRITICAL
  • block on the semaphore
  • in the other process, after writing data, call SignalObjectAndWait on the semaphore with a timeout of 1 (or even zero timeout)
  • if you want, you can experiment binding them both to the same core

This will create a thread that will be the first (or among the first) to get CPU time, but it is not running.
When the writer thread calls SignalObjectAndWait, it atomically signals and blocks (even if it waits for "zero time" that is enough to reschedule). The other thread will wake from the Semaphore and do its work. Thanks to its high priority, it will not be interrupted by other "normal" (that is, non-realtime) threads. It will keep hogging CPU time until done, and then block again on the semaphore. At this point, SignalObjectAndWait returns.

酒解孤独 2024-10-30 11:32:33

使用任务管理器,您可以设置进程的“亲和性”。

您必须将时间关键型应用程序的亲和力设置为核心 4,并将所有其他进程的亲和力设置为核心 1、2 和 3。当然假设有四个核心。

Using the Task Manager, you can set the "affinity" of processes.

You would have to set the affinity of your time-critical app to core 4, and the affinity of all the other processes to cores 1, 2, and 3. Assuming four cores of course.

过潦 2024-10-30 11:32:33

您可以调用 SetProcessAffinityMask 在除您的进程之外的每个进程上使用一个掩码,该掩码仅排除“属于”您的进程的核心,并在您的进程上使用它来将其设置为仅在该核心上运行(或者更好的是,SetThreadAffinityMask 只是在执行时间关键任务的线程上)。

You could call the SetProcessAffinityMask on every process but yours with a mask that excludes just the core that will "belong" to your process, and use it on your process to set it to run just on this core (or, even better, SetThreadAffinityMask just on the thread that does the time-critical task).

海螺姑娘 2024-10-30 11:32:33

鉴于更改频率不高,我认为 CPU 缓存正在变冷。

这听起来很奇怪。

我们假设您的轮询线程和写入线程位于不同的内核上。

轮询线程将读取共享内存地址,因此将缓存数据。该缓存行可能被标记为独占。然后写线程最后写;首先,它读取内存的缓存行(以便该行现在被标记为在两个内核上共享),然后进行写入。写入会导致轮询线程 CPU 的缓存行被标记为无效。然后轮询线程再次读取;如果它在写入线程仍然缓存数据时进行读取,它将从第二个核心缓存中读取,使其缓存行无效并为自己获取所有权。为此需要大量的总线交通开销。

另一个问题是,写入线程如果不经常写入,几乎肯定会丢失具有共享内存地址的页面的 TLB 条目。重新计算物理地址是一个漫长而缓慢的过程。由于轮询线程经常轮询,因此该页可能始终位于该核心 TLB 中;从这个意义上说,就延迟而言,您可能会做得更好,让两个线程位于同一核心上。 (尽管如果它们都是计算密集型的,则它们可能会产生破坏性干扰,并且成本可能会高得多 - 我不知道,因为我不知道线程在做什么)。

您可以做的一件事是在写入线程核心上使用超线程;如果您很早就知道要写入,请让超线程读取共享内存地址。这将在写入线程仍忙于计算时加载 TLB 和缓存,从而为您提供并行性。

Given the infrequency of changes I think the CPU cache is getting cold.

That sounds very strange.

Let's assume your polling thread and the writing thread are on different cores.

The polling thread will be reading the shared memory address and so will be caching the data. That cache line is probably marked as exclusive. Then the write thread finally writes; first, it reads the cache line of memory in (so that line is now marked as shared on both cores) and then it writes. Writing causes the polling thread CPU's cache line to be marked as invalid. The polling thread then comes to read again; if it reads while the writing thread still has the data cached, it will read from the second cores cache, invalidating its cache line and taking ownership for itself. There's a lot of bus traffic overhead to do this.

Another issue is that the writing thread, if it doesn't write often, will almost certainly lose the TLB entry for the page with the shared memory address. Recalculating the physical address is a long, slow process. Since the polling thread polls often, possibly that page is always in that cores TLB; and in that sense, you might well do better, in latency terms, to have both threads on the same core. (Although if they're both compute intensive, they might interfere destructively and that cost could be much higher - I can't know, as I don't know what the threads are doing).

One thing you could do is use a hyperthread on the writing thread core; if you know early on you're going to write, get the hyperthread to read the shared memory address. This will load the TLB and cache while the writing thread is still busy computing, giving you parallelism.

九公里浅绿 2024-10-30 11:32:33

Win32 函数 SetThreadAffinityMask() 是你在寻找什么。

The Win32 function SetThreadAffinityMask() is what you are looking for.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文