为什么SSE整数平均指令(PAVGB/PAVGW)在计算最终结果之前将临时和加1?
我最近一直在研究视频处理算法的SSE优化。我需要用 C 代码编写完全相同的算法来交叉检查算法的正确性。我好几次忘记了这个事实,这使得两种实现的结果变得不同。
我可以修改 C 实现以使它们匹配,因为这种差异并不重要。但为什么这些指令要这样设计呢?这背后有什么数学原因吗?
英特尔指令参考仅提到了此行为,但没有解释原因。我也尝试过谷歌搜索,但找不到任何相关内容。
更新:
感谢保罗的回答。我没有意识到这是舍入/截断问题。但由于两个操作数都是整数,因此唯一的分数将为 0.5,并且它有 2 个“最接近的整数”。 AFAIK 对于这种情况有几种舍入方法。为什么指令中专门使用四舍五入?大多数相关申请是否需要四舍五入?
I have been working on SSE optimization for a video processing algorithm recently. I need to write the exactly same algorithm in C code to cross-check correctness of the algorithm. I forgot about this fact several time, that makes results of the two implementations become different.
I can modify the C implementation to make them match since this difference doesn't matter. But why these instructions are designed like this? Is there any mathematical reason behind it?
The Intel Instructions Reference only mentions this behavior and don't explain why. I also tried googling, but couldn't find anything about it.
UPDATE:
Thanks to Paul's answer. I didn't realize that is rounding/truncation problem. But since both operands are integer, the only fraction will be 0.5, and it has 2 "nearest integer". AFAIK there are several rounding methods for this situation. Why the instructions use rounding up specifically? Do most related applications need rounding up?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
它是为了提供正确的舍入,即舍入到最接近的而不是截断。一般来说,当您用整数值除以 N 时,您需要这样做才能获得正确的舍入:
如果您这样做:
那么您将得到截断(舍入到零)的结果。
对于图像处理和 DSP 类型的应用,舍入到最接近值通常是首选。
It's to give correct rounding, i.e. round to nearest rather than truncation. In general when you divide by N with integer values you need to do this to get correct rounding:
If you just do:
then you will get a truncated (round to zero) result.
Round to nearest is generally preferred for image processing and DSP type applications.