混合 SIMD 指令和多线程时,性能是否会受到影响?
我有兴趣做一个关于人脸识别的项目(利用 SIMD 指令集)。但在今年的第一学期,我学到了一些关于线程的知识,我想知道是否可以将它们结合起来。
什么时候应该避免组合多线程和 SIMD 指令?什么时候才值得去做呢?
I was interested in doing a proyect about face-recognition (to make use of SIMD instructions set). But during the first semester of the current year, I learnt something about threads and I was wondering if I could combine them.
When should I avoid combining multithreading and SIMD instructions? When is it worth it to do it?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
保存 x87/MMX/XMM/YMM 寄存器可能需要相当长的时间并导致严重的后果
缓存颠簸。通常,FP 状态的保存和恢复是以惰性方式完成的:在上下文切换时,内核会记住当前线程作为 FP 状态的“所有者”,并在 CR0 中设置 TS 标志,并且 - 这将导致陷阱每当线程尝试执行 FP insn 时,内核都会执行此操作。此时,旧线程的 FP 状态和当前执行线程的 FP 状态分别被保存和恢复。
现在,如果在很长一段时间内(几次或多次上下文切换),除了您的线程之外,没有其他线程使用 FP insns - 惰性策略将导致没有 FP 状态被保存/恢复,并且您不会受到性能影响。
由于我们显然讨论的是多处理器系统,因此并行执行算法的线程不会相互冲突,因为它们应该在自己的 CPU/核心/HT 上执行,并具有一组私有寄存器。
tl;dr
您不应该担心保存和恢复 FP 寄存器的开销。
Saving x87/MMX/XMM/YMM registers can take quite some time and cause significant
cache thrash. Normally, saving and restoring of FP state is done in a lazy manner: upon a context switch, the kernel remembers the current thread as the "owner" of the FP state and sets the TS flag in CR0 and - this will cause a trap to the kernel whenever a thread attempts to execute an FP insn. The FP state of the old thread and the FP state of the currently executing thread are saved and restored, respectively, at that time.
Now, if for extended periods of time (several or many context switches) no other thread than yours uses FP insns - the lazy policy will cause no FP state to be saved/restored whatsoever and you won't get performance hit.
Since we're obviously talking about multiprocessor system, the threads, which execute your algorithm in parallel won't conflict with each other because they should execute on their own CPU/core/HT and have a private set of registers.
tl;dr
You shouldn't be concerned with the overhead of saving and restoring FP registers.
您认为为什么会有问题?当线程发生变化时,SIMD 寄存器将像任何其他 CPU 寄存器一样被换出。
Why do you think there would be a problem? SIMD registers will be swapped out like any other CPU registers when a thread change occurs.
多线程和 SIMD 无需担心任何新问题。只要您正确且高效地执行 SIMD,就无需担心任何事情。
这意味着 SIMD 也有其自身的实现挑战,多线程也是如此。但将它们结合起来不会使任何一个变得更复杂。
There aren't any new issues to worry about with multithreading and SIMD. So long as you're doing the SIMD correctly and efficiently, you shouldn't have anything to worry about.
Meaning SIMD has it's own implementation challenges, as does multithreading. But combining them won't make either more complex.