通过设置亲和力在单核上运行多线程程序的性能?
简而言之:
在什么情况下在单核上运行多线程应用程序会破坏性能?
将多线程应用程序的亲和力设置为仅使用一个核心怎么样?
简而言之:
我正在尝试在其自己的线程上运行 2D 引擎的物理原理。它可以工作,一开始性能看起来很正常,但我决定告诉游戏尝试以 10K FPS 运行,物理以 120FPS 运行,进入任务管理器并将亲和力设置为程序只能使用一个核心。
在将亲和力设置为一个核心之前,FPS 约为 1700,之后它达到约 70FPS。我没想到会有这样的下降。我告诉游戏尝试以 300 FPS 运行,物理以 60 FPS 运行。
同样的事情也发生了。
我没有多想,就继续修改引擎。后来我改变了一些绘图代码后再次测试了它,300 FPS,物理60FPS。在所有核心都允许的情况下,它可以很好地管理 300FPS,与单核心 FPS 下降到 4。现在我知道在单核心上运行多线程应用程序不可能那么糟糕,或者我是否不知道当您将亲和力设置为单个核心。
这是关于渲染/物理如何运行的...
循环开始
收集输入,直到(1.0 / FPS)通过。
打电话更新。
锁定物理线程互斥体,因为游戏中的事物将使用物理数据,并且我不希望引擎更新任何内容,直到此更新调用中的所有内容完成为止。
更新游戏中可能将绘制函数对象(保存要绘制的内容、绘制位置、如何绘制)发送到渲染队列的所有内容。
解锁互斥体。
渲染器在每个函数对象上调用operator()并将它们从队列中删除。
更新屏幕。
重复循环。
物理线程循环:
ALLEGRO_TIMER* timer(al_create_timer(1.0f / 60.0f));
double prevCount(0);
al_start_timer(timer);
while(true)
{
auto_mutex lock(m_mutex);
if(m_shutdown)
break;
if (!m_allowedToStep)
continue;
// Don't run too fast. This isn't final, just simple test code.
if (!(al_get_timer_count(timer) > prevCount))
continue;
prevCount = al_get_timer_count(timer);
m_world->Step(1.0f / 60.0f, 10, 10);
m_world->ClearForces();
}
// 注意:自动互斥体只是我创建的一个非常简单的对象,用于在构造函数中锁定互斥体并在析构函数中解锁它。我正在使用 Allegro 5 的线程函数。
In short:
Under what scenarios can running a multithreaded app on a single core destroy performance?
What about setting the affinity of a multithreaded app to only use one core?
In long:
I'm trying to run the physics of a 2D engine on it's own thread. It works, and at first performance seemed normal, but I decided to tell the game to try and run at 10K FPS and the physics at 120FPS, went into Task Manager and set the affinity to where the program could only use one core.
The FPS was at ~1700 before setting the affinity to one core, afterwards it went to ~70FPS. I didn't expect that kind of decrease. I told the game to try and run at 300 FPS and physics at 60FPS.
Same thing happened.
I didn't give it much thought, so I just continued modifying the engine. I tested it again later after changing some of the drawing code, 300 FPS, 60FPS for physics. With all cores allowed it managed 300FPS just fine, with affinity to a single core FPS dropped down to 4. Now I know it can't possibly be that bad running a multithreaded app on a single core or am I ignorant of something that happens when you set the affinity to a single core.
This is about how the rendering/physics runs...
Loop starts
Gather input until (1.0 / FPS) has passed.
Call update.
Lock physics thread mutex because things in the game will be using the physics data and I don't want the engine updating anything until everything in this update call finishes.
Update everything in the game which may send a Draw function object(Holds what to draw, where to draw, how to draw) to the Render queue.
Unlock mutex.
Renderer calls operator() on each function object and removes them from queue.
update screen.
repeat loop.
Physics thread loop:
ALLEGRO_TIMER* timer(al_create_timer(1.0f / 60.0f));
double prevCount(0);
al_start_timer(timer);
while(true)
{
auto_mutex lock(m_mutex);
if(m_shutdown)
break;
if (!m_allowedToStep)
continue;
// Don't run too fast. This isn't final, just simple test code.
if (!(al_get_timer_count(timer) > prevCount))
continue;
prevCount = al_get_timer_count(timer);
m_world->Step(1.0f / 60.0f, 10, 10);
m_world->ClearForces();
}
// Note: Auto mutex is just a really simple object I created to lock a mutex in constructor and unlock it in destructor. I'm using Allegro 5's threading functions.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
在这两种情况下,答案都是一样的。如果您的程序在单核上运行,则一次只有一个线程运行。这意味着任何时候一个线程必须等待另一个线程,都需要操作系统执行上下文切换,这是一项相当昂贵的操作。
当在多个内核上运行时,需要交互的两个线程很有可能同时运行,因此操作系统不需要执行上下文切换来让代码继续进行。
因此,实际上,需要大量线程间同步的代码在单核上运行速度会变慢。
但你可以让情况变得更糟。
自旋锁或任何类型的忙等待循环绝对会破坏性能。原因应该是显而易见的。您一次只能运行一个线程,因此如果您需要一个线程等待某个事件,您应该告诉操作系统立即将其置于睡眠状态,以便另一个线程可以运行。
相反,如果您只是执行一些“当条件不满足时,继续循环”繁忙循环,则您将保持线程运行,即使它没有任何事可做。它将继续循环*直到操作系统决定其时间已到,并调度另一个线程。 (如果线程没有被某些东西阻塞,通常会允许它一次运行 10 毫秒以上。)
在一般的多线程编程中,*尤其运行的多线程代码在单核上,您需要发挥出色,并且不要过度占用 CPU 核心。如果您无事可做,请允许另一个线程运行。
猜猜你的代码在做什么。
你认为这些线条的作用是什么?
运行循环!
我准备好跑步了吗?不?然后再次运行循环。我现在准备好跑步了吗?还是没有?再次运行循环......
换句话说,“我现在拥有了CPU,我永远不会投降!如果有人想要CPU,他们就必须从我冰冷的尸体上夺走它!”
如果您没有什么可以使用 CPU 的,那么就放弃它,特别是如果您有另一个线程准备运行。
使用互斥锁或其他同步原语,或者如果您同意更近似的基于时间的睡眠周期,请调用
Sleep()
。但是,如果您想要任何体面的性能,请不要无限期地占用 CPU,如果您正在等待另一个线程执行某些处理。
In both cases, the answer is much the same. If your program is running on a single core, then only one thread runs at a time. And that means that any time one thread has to wait for another, you need the OS to perform a context switch, which is a fairly expensive operation.
When run on multiple cores, there's a decent chance that the two threads that need to interact will both be running simultaneously, and so the OS won't need to perform a context switch for your code to proceed.
So really, code which requires a lot of inter-thread synchronization is going to run slower on a single core.
But you can make it worse.
A spinlock, or any kind of busy-waiting loop will absolutely destroy performance. And it should be obvious why. You can only run one thread at a time, so if you need a thread to wait for some event, you should tell the OS to put it to sleep immediately, so that another thread can run.
If instead you just do some "while condition is not met, keep looping" busy loop, you're keeping the thread running, even though it has nothing to do. It'll continue looping *until the OS decides that its time is up, and it schedules another thread. (And if the thread doesn't get blocked by something, it'll typically be allowed to run for upwards of 10 milliseconds at a time.)
In multithreaded programming in general, and *especially multithreaded code running on a single core, you need to play nice, and not hog the CPU core more than necessary. If you have nothing sensible to do, allow another thread to run.
And guess what your code is doing.
What do you think the effect of these lines is?
RUN THE LOOP!
AM I READY TO RUN YET? NO? THEN RUN THE LOOP AGAIN. AM I READY TO RUN NOW? STILL NO? RUN THE LOOP AGAIN.....
In other words, "I have the CPU now, AND I WILL NEVER SURRENDER! IF SOMEONE ELSE WANTS THE CPU THEY'LL HAVE TO TAKE IT FROM MY COLD DEAD BODY!"
If you have nothing to use the CPU for, then give it up, especially if you have another thread that is ready to run.
Use a mutex or some other synchronization primitive, or if you're ok with a more approximate time-based sleep period, call
Sleep()
.But don't, if you want any kind of decent performance, hog the CPU indefinitely, if you're waiting for another thread to do some processing.
当您查看处理器时,不要将其视为一个只计算一件又一件事情的块。将其视为计算器,您必须为
Windows(和所有操作系统)保留一个时间,确保为所有正在运行的应用程序保留该时间。当你运行一个程序时,计算机不仅仅执行新程序想要执行的所有计算,Windows 还会为程序分配特定的时间。当这个时间结束后,下一个程序就会有一些时间。 Windows 为您完成了这一切,因此只有当您想了解它时才有意义。
不过,这确实会影响您如何看待多线程,因为当 Windows 环顾四周并看到多线程应用程序时,它会说“我将把它作为 2 个单独的程序处理”,因此它会为这两个程序分配时间。因此,一个人不会完全阻止另一个人进行计算。
所以不,以多线程方式运行程序不会降低程序的性能,但会使其周围的其他程序变慢一些。并产生少量的开销。但如果您正在进行大量计算并且导致程序挂起,请随意使用多线程。
When you look at a processor, don't look at it like a block that just calculates one thing after another. look at it as a calculator you have to reserve a time for
Windows (and all operating systems) makes sure this time is reserved for all running apps. When you run a program, the computer doesn't just do all the calculations the new program wants to, windows allocates the program a specific amount of time. When that time is over, the next program gets some time. windows does all this for you, so it's only relevant if you want to understand it.
This does effect how you look at multi-threading though, because when windows looks around and sees a multi-threaded application, it will say "i will handle this as 2 separate programs" so it allocates time for both. therefore one wont completely stop the other from doing calculations.
So no, it will not TANK performance of your program to run it multi-threaded, but it will makes other programs around it a little slower. and create a small amount of overhead. but if you are doing large calculations and it is causing your program to hang, feel free to multi-thread.