在另一个线程上有效执行的一项工作的最小大小?
我有一个接收 UDP 消息的低延迟系统。根据消息的不同,系统会发送 0 到 5 条消息进行响应。计算出每个可能的响应需要 50 us(微秒),因此如果我们必须发送 5 个响应,则需要 250 us。
我正在考虑将系统分开,以便每个可能的响应都由不同的线程计算,但我很好奇改进这一点所需的最短“工作时间”。虽然我知道我需要对此进行基准测试,但我对有关应在单独线程上完成的最小工作的意见感兴趣。
如果我有 5 个线程等待信号来完成 50 us 的工作,并且它们竞争不多,那么所有 5 个线程完成之前的总时间会多于还是少于 250 us?
I have a low latency system that receives UDP messages. Depending on the message, the system responds by sending out 0 to 5 messages. Figuring out each possible response takes 50 us (microseconds), so if we have to send 5 responses, it takes 250 us.
I'm considering splitting the system up so that each possible response is calculated by a different thread, but I'm curious about the minimum "work time" needed to make that better. While I know I need to benchmark this to be sure, I'm interested in opinions about the minimum piece of work that should be done on a separate thread.
If I have 5 threads waiting on a signal to do 50 us of work, and they don't contend much, will the total time before all 5 are done be more or less than 250 us?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
如果线程已经在核心上运行,则将数据从一个线程传递到另一个线程的速度非常快,只需 1-4 us。 (而不是睡眠/等待/屈服)如果您的线程必须唤醒,则可能需要 15 us,但任务也将花费更长的时间,因为缓存可能会有大量未命中。这意味着任务可能需要 2-3 倍的时间。
Passing data from one thread to another is very fast 1-4 us provided the thread is already running on the core. (and not sleep/wait/yielding) If your thread has to wake it can take 15 us but the task will also take longer as the cache is likely to have loads of misses. This means the task can take 2-3x longer.
50us 是计算密集型还是 IO 密集型?如果受计算限制,您是否有多个核心可用于并行运行它们?
抱歉 - 有很多问题,但您的特定环境会影响这个问题的答案。您需要分析并确定在您的特定场景中产生差异的因素(可能运行不同大小的测试线程池?)。
(另外)不要忘记,线程默认情况下会为其堆栈占用大量内存(默认情况下为 512k,IIRC),这也可能会影响性能(通过分页请求等)
Is that 50us compute-bound, or IO-bound ? If compute-bound, do you have multiple cores available to run these in parallel ?
Sorry - lots of questions, but your particular environment will affect the answer to this. You need to profile and determine what makes a difference in your particular scenario (perhaps run tests with differently size Threadpools ?).
Don't forget (also) that threads take up a significant amount of memory by default for their stack (by default, 512k, IIRC), and that could affect performance too (through paging requests etc.)
如果您的内核多于线程,并且线程真正独立,那么如果多线程方法花费的时间少于 250 us,我不会感到惊讶。是否存在将取决于创建和销毁线程的开销。不过,你的情况似乎很理想。
If you have more cores than threads, and if the threads are truly independent, then I would not be surprised if the multi-threaded approach took less than 250 us. Whether it does or not will depend on the overhead of creating and destroying threads. Your situation seems ideal, however.