如何确定合适的检查间隔?

发布于 2024-09-16 00:04:57 字数 502 浏览 8 评论 0原文

我刚刚开始开发一个存在一些 CPU 问题的龙卷风应用程序。 CPU 时间将随着时间的推移单调增长,使 CPU 达到 100%。目前系统设计为不阻塞主线程。如果它需要执行阻塞操作并且异步驱动程序不可用,它将生成另一个线程来执行阻塞操作。

因此,我们的主线程几乎完全受 CPU 限制,而其他一些线程几乎完全受 IO 限制。根据我的阅读,这似乎是遇到 GIL 问题的完美方法。另外,我的分析显示我们花费了大量时间等待信号(我假设这就是 __semwait_signal 正在做的事情),这与 GIL 在我有限的时间内产生的效果是一致的理解。

如果我使用 sys.setcheckinterval 将检查间隔设置为 300,CPU 增长会显着减慢。我想要确定的是我是否应该增加检查间隔,将其保留在 300,或者害怕增加它。毕竟,我注意到 CPU 性能变得更好,但我有点担心这会对系统的响应能力产生负面影响。

当然,正确的答案可能是我们需要重新思考我们的架构以将 GIL 考虑在内。但这不是可以立即完成的事情。那么我如何确定短期内采取的适当行动方案呢?

I'm just starting to work on a tornado application that is having some CPU issues. The CPU time will monotonically grow as time goes by, maxing out the CPU at 100%. The system is currently designed to not block the main thread. If it needs to do something that blocks and asynchronous drivers aren't available, it will spawn another thread to do the blocking operation.

Thus we have the main thread being almost totally CPU-bound and a bunch of other threads that are almost totally IO-bound. From what I've read, this seems to be the perfect way to run into problems with the GIL. Plus, my profiling shows that we're spending a lot of time waiting on signals (which I'm assuming is what __semwait_signal is doing), which is consistent with the effects the GIL would have in my limited understanding.

If I use sys.setcheckinterval to set the check interval to 300, the CPU growth slows down significantly. What I'm trying to determine is whether I should increase the check interval, leave it at 300, or be scared with upping it. After all, I notice that CPU performance gets better, but I'm a bit concerned that this will negatively impact the system's responsiveness.

Of course, the correct answer is probably that we need to rethink our architecture to take the GIL into account. But that isn't something that can be done immediately. So how do I determine the appropriate course of action to take in the short-term?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

彼岸花似海 2024-09-23 00:04:57

我要检查的第一件事是确保您正确退出线程。仅凭您的描述很难弄清楚发生了什么,但您使用了“单调”一词,这意味着 CPU 使用与时间而不是负载相关。

您很可能会遇到 Python 的线程限制,但它应该随着负载(活动线程数量)而上下变化,并且随着这些线程的退出,CPU 使用率(上下文切换成本)应该减少。线程一旦创建,是否有某种原因可以永远存在?如果是这种情况,请优先考虑重新架构。否则,短期内应该弄清楚为什么 CPU 使用率与时间而不是负载相关。这意味着每个新线程在系统中都有一个永久的、不可逆转的成本——这意味着它永远不会退出。

The first thing I would check for would be to ensure that you're properly exiting threads. It's very hard to figure out what's going on with just your description to go from, but you use the word "monotonically," which implies that CPU use is tied to time rather than to load.

You may very well be running into threading limits of Python, but it should vary up and down with load (number of active threads,) and CPU usage (context switching costs) should reduce as those threads exit. Is there some reason for a thread, once created, to live forever? If that's the case, prioritize that rearchitecture. Otherwise, short term would be to figure out why CPU usage is tied to time and not load. It implies that each new thread has a permanent, irreversible cost in your system - meaning it never exits.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文