如何存储和推送模拟状态，同时最大限度地减少每秒更新的影响？

发布于 2024-10-17 07:21:49 字数 1612 浏览 2 评论 0原文

我的应用程序由两个线程组成：

GUI 线程（使用 Qt）
模拟线程

我使用两个线程的原因是保持 GUI 响应能力，同时让 Sim 线程尽可能快地旋转。

在我的 GUI 线程中，我以 30-60 的 FPS 渲染 sim 中的实体；然而，我希望我的模拟卡能够“向前推进”——可以这么说——并排队最终绘制的游戏状态（想想流视频，你有一个缓冲区）。

现在，对于我渲染的模拟的每一帧，我需要相应的模拟“状态”。所以我的 sim 线程看起来像这样：

while(1) {
    simulation.update();
    SimState* s = new SimState;
    simulation.getAgents( s->agents ); // store agents
    // store other things to SimState here..
    stateStore.enqueue(s); // stateStore is a QQueue<SimState*>
    if( /* some threshold reached */ )
        // push stateStore
}

SimState 看起来像：

struct SimState {
    std::vector<Agent> agents;
    //other stuff here
};

并且 Simulation::getAgents 看起来像：

void Simulation::getAgents(std::vector<Agent> &a) const
{
    // mAgents is a std::vector<Agent>
    std::vector<Agent> a_tmp(mAgents);
    a.swap(a_tmp);
}

Agent 本身是有些复杂的类。成员是一堆 int 和 float 以及两个 std::vector。

在当前的设置下，SIM 的处理速度必须比 GUI 线程的绘制速度快。我已经验证当前的瓶颈是 simulation.getAgents( s->agents )，因为即使我忽略推送，每秒更新速度也很慢。如果我注释掉该行，我会发现每秒更新数有几个数量级的提高。

那么，我应该使用什么类型的容器来存储模拟的状态？我知道自动取款机上有大量的复制行为，但其中一些是不可避免的。我应该将 Agent* 存储在向量中而不是 Agent 吗？

注意：实际上，模拟并不是在循环中，而是使用 Qt 的 QMetaObject::invokeMethod(this, "doSimUpdate", Qt::QueuedConnection); 所以我可以使用信号/槽在线程之间进行通信；但是，我使用 while(1){} 验证了一个更简单的版本，问题仍然存在。

原文

My app is comprised of two threads:

GUI Thread (using Qt)
Simulation Thread

My reason for using two threads is to keep the GUI responsive, while letting the Sim thread spin as fast as possible.

In my GUI thread I'm rendering the entities in the sim at an FPS of 30-60; however, I want my sim to "crunch ahead" - so to speak - and queue up game state to be drawn eventually (think streaming video, you've got a buffer).

Now for each frame of the sim I render I need the corresponding simulation "State". So my sim thread looks something like:

while(1) {
    simulation.update();
    SimState* s = new SimState;
    simulation.getAgents( s->agents ); // store agents
    // store other things to SimState here..
    stateStore.enqueue(s); // stateStore is a QQueue<SimState*>
    if( /* some threshold reached */ )
        // push stateStore
}

SimState looks like:

struct SimState {
    std::vector<Agent> agents;
    //other stuff here
};

And Simulation::getAgents looks like:

void Simulation::getAgents(std::vector<Agent> &a) const
{
    // mAgents is a std::vector<Agent>
    std::vector<Agent> a_tmp(mAgents);
    a.swap(a_tmp);
}

The Agents themselves are somewhat complex classes. The members are a bunch of ints and floats and two std::vector<float>s.

With this current setup the sim can't crunch must faster than the GUI thread is drawing. I've verified that the current bottleneck is simulation.getAgents( s->agents ), because even if I leave out the push the updates-per-second are slow. If I comment out that line I see several orders of magnitude improvement in updates/second.

So, what sorts of containers should I be using to store the simulation's state? I know there is a bunch of copying going on atm, but some of it is unavoidable. Should I store Agent* in the vector instead of Agent ?

Note: In reality the simulation isn't in a loop, but uses Qt's QMetaObject::invokeMethod(this, "doSimUpdate", Qt::QueuedConnection); so I can use signals/slots to communicate between the threads; however, I've verified a simpler version using while(1){} and the issue persists.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

止于盛夏 2024-10-24 07:21:49

尝试重新使用您的 SimState 对象（使用某种池机制），而不是每次都分配它们。经过几次模拟循环后，重新使用的 SimState 对象的向量将增长到所需的大小，从而避免重新分配并节省时间。

实现池的一种简单方法是首先将一堆预先分配的 SimState 对象推送到 std::stack 上。请注意，堆栈比队列更可取，因为您想要获取缓存中更有可能“热”的 SimState 对象（最近使用的 SimState 对象将位于堆栈的顶部）。您的模拟队列将 SimState 对象从堆栈中弹出，并用计算出的 SimState 填充它们。然后，这些计算出的 SimState 对象被推送到生产者/消费者队列中以提供给 GUI 线程。由 GUI 线程渲染后，它们被推回 SimState 堆栈（即“池”）。在执行所有这些操作时，尽量避免不必要地复制 SimState 对象。在“管道”的每个阶段直接使用 SimState 对象。

当然，您必须在 SimState 堆栈和队列中使用正确的同步机制以避免竞争条件。 Qt 可能已经有线程安全的堆栈/队列。如果存在大量争用，无锁堆栈/队列可能会加快速度（英特尔线程构建模块提供了此类无锁队列）。考虑到计算 SimState 大约需要 1/50 秒，我怀疑争用会成为问题。

如果您的 SimState 池耗尽，则意味着您的模拟线程太“超前”并且可以等待一些 SimState 对象返回到池中。模拟线程应该阻塞（使用条件变量），直到 SimState 对象在池中再次可用。 SimState 池的大小对应于可以缓冲的 SimState 数量（例如，约 50 个对象的池可为您提供长达约 1 秒的紧急处理时间）。

您还可以尝试运行并行模拟线程以利用多核处理器。线程池模式在这里很有用。但是，必须注意计算出的 SimState 必须按正确的顺序排队。按时间戳排序的线程安全优先级队列可能在这里起作用。

这是我建议的管道架构的简单图：

pipeline Architecture

（右键单击并选择查看图像以获得更清晰的效果）（注意

：池和队列通过指针而不是值来保存 SimState！）

希望这会有所帮助。

如果您计划重复使用您的 SimState 对象，那么您的 Simulation::getAgents 方法将效率低下。这是因为 vector& a 参数可能已经有足够的容量来保存代理列表。

您现在这样做的方式会丢弃这个已经分配的向量并从头开始创建一个新的向量。

IMO，您的 getAgents 应该是：

void Simulation::getAgents(std::vector<Agent> &a) const
{
    a = mAgents;
}

是的，您会失去异常安全性，但您可能会获得性能（尤其是使用可重用的 SimState 方法）。

另一个想法：您可以尝试使用 c 样式数组（或 boost::array ）和“count”变量代替 std::vector 来使 Agent 对象固定大小code> 用于代理的浮动列表成员。只需使固定大小的数组足够大以适应模拟中的任何情况即可。是的，你会浪费空间，但你可能会获得很多速度。

然后，您可以使用固定大小的对象分配器来池化您的代理（例如boost::pool）并通过指针（或shared_ptr）传递它们。这将消除大量的堆分配和复制。

您可以单独使用这个想法，也可以与上述想法结合使用。这个想法似乎比上面的管道更容易实现，所以你可能想先尝试一下。

还有另一个想法：您可以将模拟分解为多个阶段，并在其自己的线程中执行每个阶段，而不是使用线程池来运行模拟循环。生产者/消费者队列用于在阶段之间交换 SimState 对象。为了使其有效，不同阶段需要具有大致相似的 CPU 工作负载（否则，一个阶段将成为瓶颈）。这是利用并行性的不同方式。

Try re-using your SimState objects (using some kind of pool mechanism) instead of allocating them every time. After a few simulation loops, the re-used SimState objects will have vectors that have grown to the needed size, thus avoiding reallocation and saving time.

An easy way to implement a pool is to initially push a bunch of pre-allocated SimState objects onto a std::stack<SimState*>. Note that a stack is preferable to a queue, because you want to take the SimState object that is more likely to be "hot" in the cache memory (the most recently used SimState object will be at the top of the stack). Your simulation queue pops SimState objects off the stack and populates them with the computed SimState. These computed SimState objects are then pushed into a producer/consumer queue to feed the GUI thread. After being rendered by the GUI thread, they are pushed back onto the SimState stack (i.e. the "pool"). Try to avoid needless copying of SimState objects while doing all this. Work directly with the SimState object in each stage of your "pipeline".

Of course, you'll have to use the proper synchronization mechanisms in your SimState stack and queue to avoid race conditions. Qt might already have thread-safe stacks/queues. A lock-free stack/queue might speed things up if there is a lot of contention (Intel Thread Building Blocks provides such lock-free queues). Seeing that it takes on the order of 1/50 seconds to compute a SimState, I doubt that contention will be a problem.

If your SimState pool becomes depleted, then it means that your simulation thread is too "far ahead" and can afford to wait for some SimState objects to be returned to the pool. The simulation thread should block (using a condition variable) until a SimState object becomes available again in the pool. The size of your SimState pool corresponds to how much SimState can be buffered (e.g. a pool of ~50 objects gives you a crunch-ahead time of up to ~1 seconds).

You can also try running parallel simulation threads to take advantage of multi-core processors. The Thread Pool pattern can be useful here. However, care must be taken that the computed SimStates are enqueued in the proper order. A thread-safe priority queue ordered by time-stamp might work here.

Here's a simple diagram of the pipeline architecture I'm suggesting:

pipeline architecture

(Right-click and select view image for a clearer view.)

(NOTE: The pool and queue hold SimState by pointer, not by value!)

Hope this helps.

If you plan to re-use your SimState objects, then your Simulation::getAgents method will be inefficient. This is because the vector<Agent>& a parameter is likely to already have enough capacity to hold the agent list.

The way you're doing it now would throw away this already allocated vector and create a new one from scratch.

IMO, your getAgents should be:

void Simulation::getAgents(std::vector<Agent> &a) const
{
    a = mAgents;
}

Yes, you lose exception safety, but you might gain performance (especially with the reusable SimState approach).

Another idea: You could try making your Agent objects fixed-size, by using a c-style array (or boost::array) and "count" variable instead std::vector for Agent's float list members. Simply make the fixed-size array big enough for any situation in your simulation. Yes, you'll waste space, but you might gain a lot of speed.

You can then pool your Agents using a fixed-size object allocator (such as boost::pool) and pass them around by pointer (or shared_ptr). That'll eliminate a lot of heap allocation and copying.

You can use this idea alone or in combination with the above ideas. This idea seems easier to implement than the pipeline thing above, so you might want to try it first.

Yet another idea: Instead of a thread pool for running simulation loops, you can break down the simulation into several stages and execute each stage in it's own thread. Producer/consumer queues are used to exchange SimState objects between stages. For this to be effective, the different stages need to have roughly similar CPU workloads (otherwise, one stage will become the bottleneck). This is a different way to exploit parallelism.

回复收藏 0 原文

~没有更多了~