CPU使用率高的常见原因有哪些?

发布于 2025-01-05 07:31:24 字数 1955 浏览 2 评论 0原文

背景:

在我用 C++ 编写的应用程序中,我创建了 3 个线程:

  • AnalysisThread(或 Producer):它读取输入文件,解析它,生成模式,并将它们排入 std: :队列1
  • PatternIdRequestThread(或 Consumer):它将模式从队列中取出,并通过客户端(用 C++ 编写)将它们一一发送到数据库,客户端返回模式 uid,然后将其分配给相应的模式图案。
  • ResultPersistenceThread :它还做了一些事情,与数据库对话,并且就 CPU 使用率而言,它按预期工作得很好。

前两个线程占用 60-80% 的 CPU 使用率,每个线程平均占用 35%。

问题:

我不明白为什么某些线程占用较高的 CPU 使用率。

我分析如下:如果是操作系统做出诸如 context-switch 之类的决定,中断,以及 调度 应该授予哪个线程访问系统资源(例如 CPU 时间)的权限,然后为什么进程中的某些线程会比其他线程使用更多的CPU?看起来某些线程在枪口下强行从操作系统夺走CPU,或者操作系统对某些线程有真正的软肋,因此它从一开始就偏向于它们,给他们所有的资源。为什么不能公正地给予所有人平等的待遇呢?

我知道这很天真。但如果我沿着这条线思考,我会更加困惑:操作系统根据线程要完成的工作量向线程提供对 CPU 的访问权限,但是操作系统如何计算或预测工作量在完全执行之前?

请问CPU占用率高的原因是什么?我们如何识别它们?仅仅通过查看代码就可以识别它们吗?有哪些工具?

我使用的是 Visual Studio 2010。

1.我也对 std::queue 存有疑问。我知道标准容器不是线程安全的。但是,如果恰好有一个线程将项目排队到队列中,那么如果恰好有一个线程从中将项目放入队列,那么是否安全?我想象它就像一个管道,一侧插入数据,另一侧删除数据,那么为什么如果同时完成会不安全呢?但这不是本主题中的真正问题,但是,您可以在答案中添加注释来解决此问题。

更新:

在我意识到我的消费者线程正在使用 busy- 后我已经用 睡眠 3 秒。此修复是暂时的,很快我将使用 事件 相反。但即使睡眠,CPU使用率下降到30-40%,偶尔会上升到50%,从可用性的角度来看这似乎并不理想,因为系统不响应用户当前正在使用的其他应用程序。

有什么方法可以改善 CPU 使用率过高的问题吗?如前所述,生产者线程(现在使用大部分 CPU 周期)读取文件,解析其中的数据包(某种格式),并从中生成模式。如果我使用睡眠,那么 CPU 使用率会降低,但这是一个好主意吗?常见的解决方法有哪些?

Background:

In my application written in C++, I have created 3 threads:

  • AnalysisThread (or Producer) : it reads an input file, parses it, and generates patterns, and enqueue them into std::queue1.
  • PatternIdRequestThread (or Consumer) : it deque patterns from the queue, and sends them, one by one, to database through a client (written in C++), which returns pattern uid which is then assigned to the corresponding pattern.
  • ResultPersistenceThread : it does few more things, talks to database, and it works fine as expected, as far as CPU usage is concerned.

First two threads take 60-80% of CPU usage, each takes 35% on average.

Question:

I don't understand why some threads take high CPU usage.

I analyse it as follows : if it is the OS who makes decisions like context-switch, interrupt, and scheduling as to which thread should be given access to system resources, such as CPU time, then how come some threads in a process happen to use more CPU than the others? It looks like some threads forcefully takes CPU from the OS at gunpoint, or the OS has a real soft spot for some threads and so it is biased towards them from the very beginning, giving them all the resources it has. Why can't it be impartial and give them all equally?

I know that it is naive. But I get confused more if I think along this line : the OS gives access to CPU to a thread, based on the amount of work to be done by the thread, but how does the OS compute or predict the amount of work before executing it completely?

I wonder what are the causes for high CPU usage? How can we identify them? Is it possible to identify them just by looking at the code? What are the tools?

I'm using Visual Studio 2010.

1. I've my doubt about std::queue as well. I know that standard containers aren't thread safe. But if exactly one thread enqueue items to queue, then is it safe if exactly one thread deque items from it? I imagine it be like a pipe, on one side you insert data, on the other, you remove data, then why would it be unsafe if its done simultenously? But that is not the real question in this topic, however, you can add a note in your answer, addressing this.

Updates:

After I realized that my consumer-thread was using busy-spin which I've fixed with Sleep for 3 seconds. This fix is temporary, and soon I will use Event instead. But even with Sleep, the CPU usage has dropped down to 30-40%, and occasionally it goes up to 50% which doesn't seem to be desirable from the usability point of view, as the system doesn't respond to other applications which the user is currently working with.

Is there any way that I can still improve on the high CPU usage? As said earlier, the producer thread (which now uses most of the CPU cycles) reads a file, parses packets (of some format) in it, and generates patterns out of them. If I use sleep, then the CPU usage would decrease but would it be a good idea? What are the common ways to solve it?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(8

风吹雨成花 2025-01-12 07:31:24

就我个人而言,如果我的线程有工作要做,并且我的机器上有空闲核心,因为操作系统没有为它们提供高 CPU 使用率,我会非常恼火。所以我真的不认为这里有任何问题[编辑:事实证明你的繁忙循环是一个问题,但原则上高CPU使用率没有任何问题]。

操作系统/调度程序几乎无法预测线程将完成的工作量。线程(过度简化)处于以下三种状态之一:

  1. 阻塞等待某些东西(睡眠、互斥体、I/O 等)
  2. 可运行,但当前未运行,因为其他东西正在
  3. 运行。

调度程序将选择尽可能多的项目来运行,因为它有核心(或超线程,等等),并运行每个项目直到它阻塞或直到称为“时间片”的任意时间段到期。然后,如果可以的话,它会安排其他事情。

因此,如果一个线程大部分时间都花在计算而不是阻塞上,并且有一个空闲的核心,那么它将占用大量的CPU时间。

调度程序如何根据优先级等因素选择要运行的内容有很多细节。但基本思想是,有很多事情要做的线程不需要被预测为计算量大的线程,只要有事情需要调度,它就会始终可用,因此往往会被调度。

对于您的示例循环,您的代码实际上没有执行任何操作,因此您需要在判断 5-7% CPU 是否有意义之前检查它是如何优化的。理想情况下,在双核计算机上,处理繁重的线程应该占用 50% 的 CPU。在 4 核机器上,为 25%。因此,除非您至少有 16 个核心,否则您的结果乍一看是异常的(如果您有 16 个核心,那么一个线程占用 35% 会更加异常!)。在标准桌面操作系统中,大多数核心大部分时间都是空闲的,因此实际程序运行时占用的 CPU 比例越高越好。

在我的机器上,当我运行主要是解析文本的代码时,我经常会达到一个核心的 CPU 使用量。

如果只有一个线程将项目放入队列,那么如果
恰好有一个线程双端队列项目来自其中?

不,这对于使用标准容器的 std::queue 来说是不安全的。 std::queue 是序列容器(vectordequelist)顶部的薄包装器,它没有添加任何线程安全性。添加项目的线程和删除项目的线程都会修改一些共同的数据,例如底层容器的 size 字段。您需要一些同步,或者需要一个依赖于对公共数据的原子访问的安全无锁队列结构。 std::queue 两者都没有。

Personally I'd be pretty annoyed if my threads had work to do, and there were idle cores on my machine because the OS wasn't giving them high CPU usage. So I don't really see that there's any a problem here [Edit: turns out your busy looping is a problem, but in principle there's nothing wrong with high CPU usage].

The OS/scheduler pretty much doesn't predict the amount of work a thread will do. A thread is (to over-simplify) in one of three states:

  1. blocked waiting for something (sleep, a mutex, I/O, etc)
  2. runnable, but not currently running because other things are
  3. running.

The scheduler will select as many things to run as it has cores (or hyperthreads, whatever), and run each one either until it blocks or until an arbitrary period of time called a "timeslice" expires. Then it will schedule something else if it can.

So, if a thread spends most of its time in computation rather than blocking, and if there's a core free, then it will occupy a lot of CPU time.

There's a lot of detail in how the scheduler chooses what to run, based on things like priority. But the basic idea is that a thread with a lot to do doesn't need to be predicted as compute-heavy, it will just always be available whenever something needs scheduling, and hence will tend to get scheduled.

For your example loop, your code doesn't actually do anything, so you'd need to check how it has been optimized before judging whether 5-7% CPU makes sense. Ideally, on a two-core machine a processing-heavy thread should occupy 50% CPU. On a 4 core machine, 25%. So unless you have at least 16 cores then your result is at first sight anomalous (and if you had 16 cores, then one thread occupying 35% would be even more anomalous!). In a standard desktop OS most cores are idle most of the time, so the higher the proportion of CPU that your actual programs occupy when they run, the better.

On my machine I frequently hit one core's worth of CPU use when I run code that is mostly parsing text.

if exactly one thread enqueue items to queue, then is it safe if
exactly one thread deque items from it?

No, that is not safe for std::queue with a standard container. std::queue is a thin wrapper on top of a sequence container (vector, deque or list), it doesn't add any thread-safety. The thread that adds items and the thread that removes items modify some data in common, for example the size field of the underlying container. You need either some synchronization, or else a safe lock-free queue structure that relies on atomic access to the common data. std::queue has neither.

策马西风 2025-01-12 07:31:24

编辑:好的,由于您使用繁忙自旋来阻塞队列,这很可能是导致 CPU 使用率高的原因。操作系统给人的印象是您的线程正在做有用的工作,但实际上它们没有,因此它们获得了完整的 CPU 时间。这里有有趣的讨论:

我建议您切换到事件或其他阻塞机制,或者使用一些同步队列来代替,看看它是如何进行的。

此外,关于队列是线程安全的推理“因为只有两个线程正在使用它”是非常危险的。

假设队列被实现为一个链表,想象一下如果它只剩下一个或两个元素会发生什么。由于您无法控制生产者和消费者的相对速度,因此很可能是这种情况,因此您遇到了大麻烦。

Edit: Ok, since you are using busy spin to block on the queue, this is most likely the cause for high CPU usage. The OS is under the impression that your threads are doing useful work when they are actually not, so they get full CPU time. There was interesting discussion here: Which one is better for performance to check another threads boolean in java

I advise you to either switch to events or other blocking mechanisms or use some synchronized queue instead and see how it goes.

Also, that reasoning about the queue being thread-safe "because only two threads are using it" is very dangerous.

Assuming the queue is implemented as a linked list, imagine what can happen if it has only one or two elements remaining. Since you have no way of controlling the relative speeds of the producer and the consumer, this may well be the case and so you're in big trouble.

野味少女 2025-01-12 07:31:24

在开始考虑如何优化线程以消耗更少的 CPU 之前,您需要了解所有 CPU 时间都花在哪里了。获取此信息的一种方法是使用 CPU 分析器。如果您没有,请尝试非常困。它易于使用,而且免费。

CPU 分析器将监视您正在运行的应用程序并记录时间花费在哪里。因此,它会为您提供一个函数列表,按采样期间使用的 CPU 数量、调用次数等排序。现在您需要从 CPU 最密集的函数开始查看分析结果,查看可以更改哪些内容以减少 CPU 使用率。

重要的是,一旦获得分析器结果,您就拥有了实际数据,可以告诉您可以优化应用程序的哪些部分以获得最大回报。

现在让我们考虑一下您可能发现的消耗大量 CPU 的事物类型。

  • 工作线程通常作为循环实现。在循环的顶部进行检查以确定是否有工作要做以及是否执行任何可用的工作。循环的新迭代再次开始循环。

    您可能会发现,通过这样的设置,分配给该线程的大部分 CPU 时间都花在循环和检查上,而很少花在实际工作上。这就是所谓的忙等待问题。要部分解决此问题,您可以在循环迭代之间添加 sleep,但这不是最佳解决方案。解决这个问题的理想方法是在没有工作要做时让线程进入睡眠状态,当其他线程为睡眠线程生成工作时,它会发送信号来唤醒它。这实际上消除了循环开销,线程只会在有工作要做时才使用 CPU。我通常使用信号量来实现此机制,但在 Windows 上您也可以使用 Event 对象。下面是一个实现的草图:

    类 MyThread {
    私人的:
        无效线程函数(){
            while (!exit()) {
                如果(there_is_work_to_do())
                    do_work();
                去睡觉();
            }
        }
        // 这是由线程函数调用的
        // 没有更多的工作要做
        无效 go_to_sleep() {
            sem.wait();
        }
    民众:
        // 这是由其他线程在添加工作后调用的
        // 线程的队列
        无效唤醒(){
            sem.signal();
        }
    };
    

    请注意,在上述解决方案中,线程函数总是在执行一项任务后尝试进入睡眠状态。如果线程的队列有更多工作项,则信号量上的等待将立即返回,因为每次将项目添加到队列中时,发起者都必须调用wake_up() 函数。

  • 您可能在分析器输出中看到的另一件事是,大部分 CPU 都花在工作线程在工作时执行的函数上。这实际上并不是一件坏事,如果大部分时间都花在工作上,那么这意味着线程有工作要做,并且有可用的 CPU 时间来完成该工作,所以原则上这里没有问题。

    但是,您可能会对您的应用程序使用如此多的 CPU 感到不满意,因此您需要寻找优化代码的方法,以便它更有效地完成工作。

    例如,您可能会发现某个小辅助函数被调用了数百万次,因此虽然该函数的单次运行速度很快,但如果将其乘以几百万次,它就会成为线程的瓶颈。此时,您应该考虑进行优化以减少此函数中 CPU 使用率的方法,可以通过优化其代码,或通过优化调用者来减少调用该函数的次数。

    所以这里的策略是根据分析报告从最昂贵的函数开始,尝试进行小的优化。然后,您重新运行探查器以查看情况如何变化。您可能会发现,对 CPU 最密集的函数进行一个小更改,会将其移至第二或第三位,从而降低总体 CPU 使用率。在对自己的进步表示祝贺后,您使用新的 top 函数重复练习。您可以继续此过程,直到您对应用程序的效率感到满意为止。

祝你好运。

Before you can start thinking about how to optimize your threads to consume less CPU you need to have an idea of where is all that CPU time spent. One way to obtain this information is by using a CPU profiler. If you don't have one, then give Very Sleepy a try. It's easy to use, and free.

The CPU profiler will monitor your running application and take notes of where time is spent. As a result it will give you a list of functions sorted by how much CPU they've used during the sampled period, how many times were called, etc. Now you need to look at the profiling results starting from the most CPU intensive functions and see what you can change in those to reduce the CPU usage.

The important thing is that once you have profiler results you have actual data that tells you what parts of your application you can optimize to obtain the biggest return.

Now let's consider the kinds of things you can find that are consuming a lot of CPU.

  • A worker thread is typically implemented as a loop. At the top of the loop a check is made to decide if there is work to do and any available work is executed. A new iteration of the loop begins the cycle again.

    You may find that with a setup like this most of the CPU time allocated to this thread is spent looping and checking, and very little is spent actually doing work. This is the so called busy-wait problem. To partially address this you can add a sleep in between loop iterations, but this isn't the best solution. The ideal way to address this problem is to put the thread to sleep when there is no work to do, and when some other thread generates work for the sleeping thread it sends a signal to awaken it. This practically eliminates the looping overhead, the thread will only use CPU when there is work to do. I typically implement this mechanism with semaphores, but on Windows you can also use an Event object. Here is a sketch of an implementation:

    class MyThread {
    private:
        void thread_function() {
            while (!exit()) {
                if (there_is_work_to_do())
                    do_work();
                go_to_sleep();
            }
        }
        // this is called by the thread function when it
        // doesn't have any more work to do
        void go_to_sleep() {
            sem.wait();
        }
    public:
        // this is called by other threads after they add work to
        // the thread's queue
        void wake_up() {
            sem.signal();
        }
    };
    

    Note that in the above solution the thread function always tries to go to sleep after executing one task. If the thread's queue has more work items, then the wait on the semaphore will return immediately, since each time an item was added to the queue the originator must have called the wake_up() function.

  • The other thing you may see in the profiler output is that most of the CPU is spent in functions executed by the worker thread while it is doing work. This is actually not a bad thing, if most of the time is spent working, then that means that the thread had work to do and there was CPU time available to do that work, so in principle there is nothing wrong here.

    But still, you may not be happy that your application uses so much CPU, so then you need to look at ways to optimize your code so that it does the the work more efficiently.

    For example, you may find that some little auxiliary function was called millions of times, so while a single run of the function is quick, if you multiply that by a few million it becomes a bottle neck for the thread. At this point you should look at ways to make optimizations to reduce the CPU usage in this function, either by optimize its code, or by optimizing the caller(s) to call the function less times.

    So the strategy here is to start from the most expensive function according to the profiling report and try to make a small optimization. Then you rerun the profiler to see how things changed. You may find that a small change to the most CPU intensive function moves it down to 2nd or 3rd place, and as a result the overall CPU usage was reduced. After you congratulate yourself for the improvement, you repeat the exercise with the new top function. You can continue this process until you are satisfied that your application is as efficient as it can be.

Good luck.

御弟哥哥 2025-01-12 07:31:24

尽管其他人已经正确分析了问题(据我所知),但让我尝试为建议的解决方案添加更多细节。

首先总结一下存在的问题:
1. 如果你让你的消费者线程在 for 循环或类似的循环中忙于旋转,那将是对 CPU 能力的严重浪费。
2. 如果您使用固定毫秒数的 sleep() 函数,那么它要么浪费 CPU(如果时间量太低),要么不必要地延迟进程(如果时间量太高)。没有办法将时间量设置得恰到好处。

您需要做的是使用一种在正确的时刻(即每当新任务添加到队列时)唤醒的睡眠类型。

我将解释如何使用 POSIX 执行此操作。我意识到,当您使用 Windows 时,这并不理想,但是,为了从中受益,您可以使用适用于 Windows 的 POSIX 库,或者使用您环境中可用的相应函数。

第 1 步: 您需要一个互斥体和一个信号:

#include <pthread.h>
pthread_mutex_t *mutex  = new pthread_mutex_t;
pthread_cond_t  *signal = new pthread_cond_t;

/* Initialize the mutex and the signal as below.
   Both functions return an error code. If that
   is not zero, you need to react to it. I will
   skip the details of this. */
pthread_mutex_init(mutex,0);
pthread_cond_init(signal,0);

第 2 步: 现在在使用者线程内,等待发送信号。这个想法是,每当生产者将新任务添加到队列中时,它就会发送信号:

/* Lock the mutex. Again, this might return an error code. */
pthread_mutex_lock(mutex);

/* Wait for the signal. This unlocks the mutex and then 'immediately'
   falls asleep. So this is what replaces the busy spinning, or the
   fixed-time sleep. */
pthread_cond_wait(signal,mutex);

/* The program will reach this point only when a signal has been sent.
   In that case the above waiting function will have locked the mutex
   right away. We need to unlock it, so another thread (consumer or
   producer alike) can access the signal if needed.  */
pthread_mutex_unlock(mutex);

/* Next, pick a task from the queue and deal with it. */

上面的步骤 2 本质上应该放在无限循环中。确保有一种方法可以让进程跳出循环。例如,尽管有点粗糙,您可以将一个“特殊”任务附加到队列中,这意味着“跳出循环”。

第 3 步: 使生产者线程能够在将任务追加到队列时发送信号:

/* We assume we are now in the producer thread and have just appended
   a task to the queue. */
/* First we lock the mutex. This must be THE SAME mutex object as used
   in the consumer thread. */
pthread_mutex_lock(mutex);

/* Then send the signal. The argument must also refer to THE SAME
   signal object as is used by the consumer. */
pthread_cond_signal(signal);

/* Unlock the mutex so other threads (producers or consumers alike) can
   make use of the signal. */
pthread_mutex_unlock(mutex);

第 4 步: 当一切完成并关闭线程时,您可以必须销毁互斥体和信号:

pthread_mutex_destroy(mutex);
pthread_cond_destroy(signal);
delete mutex;
delete signal;

最后让我重申一下其他人已经说过的一件事:您不能使用普通的 std::deque 进行并发访问。解决此问题的一种方法是声明另一个互斥锁,在每次访问双端队列之前将其锁定,然后立即解锁。

编辑:根据评论,关于制作人线程的更多内容。据我了解,生产者线程当前可以自由地将尽可能多的任务添加到队列中。所以我想它会继续这样做,并使 CPU 保持忙碌,不会因 IO 和内存访问而延迟。首先,我不认为由此导致的高 CPU 使用率是一个问题,而是一个好处。然而,一个严重的问题是队列将无限增长,可能导致进程耗尽内存空间。因此,采取的一个有用的预防措施是将队列的大小限制在合理的最大值,并在队列增长太长时让生产者线程暂停。

为了实现这一点,生产者线程将在添加新项目之前检查队列的长度。如果它已满,它会让自己进入睡眠状态,等待消费者在从队列中取出任务时发送信号。为此,您可以使用类似于上述机制的辅助信号机制。

Although the others have correctly analysed the problem already (as far as I can tell), let me try to add some more detail to the proposed solutions.

Firstly, to summarize the problems:
1. If you keep your consumer thread busy spinning in a for-loop or similar, that's a terrible waste of CPU power.
2. If you use the sleep() function with a fixed number of milliseconds, it is either a waste of CPU, too (if the time amount is too low), or you delay the process unnecessarily (if it's too high). There is no way to set the time amount just right.

What you need to do instead is to use a type of sleep that wakes up just at the right moment, i.e. whenever a new task has been appended to the queue.

I'll explain how to do this using POSIX. I realize that's not ideal when you are on Windows, but, to benefit from it, you can either use POSIX libraries for Windows or use corresponding functions available in your environment.

Step 1: You need one mutex and one signal:

#include <pthread.h>
pthread_mutex_t *mutex  = new pthread_mutex_t;
pthread_cond_t  *signal = new pthread_cond_t;

/* Initialize the mutex and the signal as below.
   Both functions return an error code. If that
   is not zero, you need to react to it. I will
   skip the details of this. */
pthread_mutex_init(mutex,0);
pthread_cond_init(signal,0);

Step 2: Now inside the consumer thread, wait for the signal to be sent. The idea is that the producer sends the signal whenever it has appended a new task to the queue:

/* Lock the mutex. Again, this might return an error code. */
pthread_mutex_lock(mutex);

/* Wait for the signal. This unlocks the mutex and then 'immediately'
   falls asleep. So this is what replaces the busy spinning, or the
   fixed-time sleep. */
pthread_cond_wait(signal,mutex);

/* The program will reach this point only when a signal has been sent.
   In that case the above waiting function will have locked the mutex
   right away. We need to unlock it, so another thread (consumer or
   producer alike) can access the signal if needed.  */
pthread_mutex_unlock(mutex);

/* Next, pick a task from the queue and deal with it. */

Step 2 above should essentially be placed inside an infinite loop. Make sure there is a way for the process to break out of the loop. For example -- although slightly crude -- you can append a 'special' task to the queue that means 'break out of the loop'.

Step 3: Enable the producer thread to send a signal whenever it has appended a task to the queue:

/* We assume we are now in the producer thread and have just appended
   a task to the queue. */
/* First we lock the mutex. This must be THE SAME mutex object as used
   in the consumer thread. */
pthread_mutex_lock(mutex);

/* Then send the signal. The argument must also refer to THE SAME
   signal object as is used by the consumer. */
pthread_cond_signal(signal);

/* Unlock the mutex so other threads (producers or consumers alike) can
   make use of the signal. */
pthread_mutex_unlock(mutex);

Step 4: When everything is finished and you shut down your threads, you must destroy the mutex and the signal:

pthread_mutex_destroy(mutex);
pthread_cond_destroy(signal);
delete mutex;
delete signal;

Finally let me re-iterate one thing the others have said already: You must not use an ordinary std::deque for concurrent access. One way of solving this is to declare yet another mutex, lock it before every access to the deque, and unlock it right after.

Edit: A few more words about the producer thread, in light of the comments. As far as I understand it, the producer thread is currently free to add as many tasks to the queue as it can. So I suppose it will keep doing that and keep the CPU busy to the extent that it isn't delayed by IO and memory access. Firstly, I don't think of the high CPU usage resulting from this as a problem, but rather as a benefit. However, one serious concern is that the queue will grow indefinitely, potentially causing the process to run out of memory space. Hence a useful precaution to take would be to limit the size of the queue to a reasonable maximum, and have the producer thread pause whenever the queue grows too long.

To implement this, the producer thread would check the length of the queue before adding a new item. If it is full, it would put itself to sleep, waiting for a signal to be sent by a consumer when taking a task off the queue. For this you could use a secondary signal mechanism, analogous to the one described above.

箹锭⒈辈孓 2025-01-12 07:31:24

线程消耗内存等资源。阻塞/解除阻塞线程会产生一次性成本。如果线程每秒阻塞/解除阻塞数万次,则可能会浪费大量 CPU。

然而,一旦线程被阻塞,无论阻塞多长时间,都不会产生持续的成本。
查找性能问题的流行方法是使用分析器。

但是,我经常这样做,我的方法是这样的: http://www .wikihow.com/Optimize-Your-Program%27s-Performance

Threads consume resources such as memory. A blocking/unblocking thread incurs a once off cost. If a thread blocking/unblocks tens of thousands of times per second this can waste significant amounts of CPU.

However once a thread is blocked, it doesn't matter how long it is blocked for, there is no ongoing cost.
The popular way to find performance problems is to use profilers.

However, I do this a lot, and my method is this: http://www.wikihow.com/Optimize-Your-Program%27s-Performance

晨光如昨 2025-01-12 07:31:24

线程 CPU 使用率取决于许多因素,但总的来说,操作系统只能根据可以中断线程的点来分配处理时间。

如果您的线程以任何方式与硬件交互,那么这将使操作系统有机会中断线程并将处理分配到其他地方,这主要基于硬件交互需要时间的假设。在您的示例中,您正在使用 iostream 库,从而与硬件进行交互。

如果你的循环没有这个,那么它很可能会使用接近 100% 的 cpu。

Thread CPU usage depends on many factors, but in the main the OS can only assign processing time based on points at which it can interrupt a thread.

If your thread interacts with hardware in anyway then this gives the OS a chance to interrupt the thread and assign processing elsewhere, mainly based on the assumption that hardware interaction takes time. In your example you're using the iostream library and thus interacting with hardware.

If your loop didn't have this then it would most likely use nearly 100% cpu.

美人迟暮 2025-01-12 07:31:24

正如人们所说,同步生产者线程和消费者线程之间的切换的正确方法是使用条件变量。当生产者想要向队列添加元素时,它会锁定条件变量,添加元素,并通知等待者有关条件变量的信息。消费者等待相同的条件变量,当收到通知时,消费队列中的元素,然后再次锁定。我个人建议使用 boost::interprocess 来实现这些,但也可以使用其他 API 以相当简单的方式完成。

另外,需要记住的一件事是,虽然从概念上讲每个线程仅在队列的一端操作,但大多数库都实现 O(1) count() 方法,这意味着它们有一个成员变量来跟踪元素的数量,这是罕见且难以诊断的并发问题的机会。

如果您正在寻找一种方法来减少消费者线程的CPU使用率(是的,我知道这是您真正的问题)......好吧,听起来它实际上正在做它现在应该做的事情,但是数据处理是昂贵的。如果您可以分析它在做什么,则可能有优化的机会。

如果您想智能地限制生产者线程...这需要更多工作,但是您可以让生产者线程将项目添加到队列中,直到达到某个阈值(例如 10 个元素),然后等待不同的 条件变量。当消费者消耗足够的数据导致排队元素的数量低于阈值(例如 5 个元素)时,它会通知第二个条件变量。如果系统的所有部分都可以快速移动数据,那么这仍然会消耗大量的 CPU,但它会在它们之间相对均匀地分布。此时,操作系统应该负责让其他不相关的进程获得其公平的 CPU 份额。

As people have said, the right way to synchronize the hand-off between the producer and consumer threads would be to use a condition variable. When the producer wants to add an element to the queue, it locks the condition variable, adds the element, and notifies waiters on the condition variable. The consumer waits on the same condition variable, and when notified, consumes elements from the queue, then locks again. I'd personally recommend using boost::interprocess for these, but it can be done in a reasonably straightforward way using other APIs too.

Also, one thing to keep in mind is that while conceptually each thread is operating on one end of the queue only, most libraries implement an O(1) count() method, which means they have a member variable to track the number of elements, and this is an opportunity for rare and difficult-to-diagnose concurrency issues.

If you're looking for a way to reduce the cpu usage of the consumer thread (yes, I know this is your real question)... well, it sounds like it's actually doing what it's supposed to now, but the data processing is expensive. If you can analyze what it's doing, there may be opportunities for optimization.

If you want to throttle the producer thread intelligently... it's a little more work, but you could have the producer thread add items to the queue until it reaches a certain threshold (say 10 elements), then wait on a different condition variable. When the consumer consumes enough data that it causes the number of queued elements to go below a threshold (say 5 elements), then it notifies this second condition variable. If all parts of the system can move the data around quickly, then this could still consume a lot of CPU, but it would be spread relatively evenly amongst them. It's at this point that the OS should be responsible for letting other unrelated processes get their fair(ish) share of the CPU.

裸钻 2025-01-12 07:31:24
  1. 使用异步(文件和套接字)IO 来减少无用的 CPU 等待时间。
  2. 如果可能,使用垂直线程模型来减少上下文切换
  3. 使用无锁数据结构
  4. 使用分析工具(例如 VTune)找出热点并进行优化
  1. use asynchronous (file and socket) IO to reduce useless CPU waiting time.
  2. use vertical threading model to reduce context switch if possible
  3. use lock-less data structure
  4. use a profiling tool, such as VTune, to figure out the hot spot and make optimization
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文