c division integer-division strength-reduction

我怎样才能将除法强度减少 2^n + 1？

发布于 2024-09-28 23:09:15 字数 445 浏览 11 评论 0原文

我需要在代码的热路径中执行一些整数除法。我已经通过分析和周期计数确定整数除法正在消耗我的成本。我希望我能做点什么来将部门减少到更便宜的程度。

在此路径中，我除以 2^n+1，其中 n 是变量。本质上，我想优化这个函数以删除除法运算符：

unsigned long compute(unsigned long a, unsigned int n)
{
    return a / ((1 << n) + 1);
}

如果我除以 2^n，我只需将 div 替换为右移 n。如果我除以一个常数，我会让编译器强度减少该特定除法，可能会将其变成乘法和一些移位。

是否有适用于 2^n+1 的类似优化？

编辑：这里的 a 可以是任意 64 位整数。 n 只取 10 到 25 之间的几个值。我当然可以为每个 n 预先计算一些值，但不能为 a 预先计算一些值。

原文

I need to perform some integer divisions in the hot path of my code. I've already determined via profiling and cycle counting that the integer divisions are costing me. I'm hoping there's something I can do to strength reduce the divisions into something cheaper.

In this path, I am dividing by 2^n+1, where n is variable. Essentially I want to optimize this function to remove the division operator:

unsigned long compute(unsigned long a, unsigned int n)
{
    return a / ((1 << n) + 1);
}

If I were dividing by 2^n, I would just replace the div with a shift-right by n. If I were dividing by a constant, I would let the compiler strength reduce that specific division, likely turning it into a multiply and some shifts.

Is there a similar optimization that applies to 2^n+1?

Edit: a here can be an arbitrary 64-bit integer. n takes only a few values between 10 and, say, 25. I can certainly precompute some values for each n, but not for a.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

习惯那些不曾习惯的习惯 2024-10-05 23:09:15

由于您只能将 int 移动这么多位置，因此您可以将所有这些情况放入由常量除法之一的选择中：

unsigned long compute(unsigned long a, unsigned int n)
{
    // assuming a 32-bit architecture (making this work for 64-bits 
    // is left as an exercise for the reader):
    switch (n) {
        case  0: return a / ((1 << 0) + 1);
        case  1: return a / ((1 << 1) + 1);
        case  2: return a / ((1 << 2) + 1);

            // cases 3 through 30...

        case 31: return a / ((1 << 31) + 1);
    }
}

因此，现在每个除法都由一个常量进行，编译器通常会减少该常量一系列乘法/移位/加法指令（如所提到的问题）。请参阅 ac/c++ 编译器将按二次方值进行常量除法优化为移位？了解详细信息。

Since you can only shift an int so many places, you can put all those cases into a choice of one of several divisions by a constant:

unsigned long compute(unsigned long a, unsigned int n)
{
    // assuming a 32-bit architecture (making this work for 64-bits 
    // is left as an exercise for the reader):
    switch (n) {
        case  0: return a / ((1 << 0) + 1);
        case  1: return a / ((1 << 1) + 1);
        case  2: return a / ((1 << 2) + 1);

            // cases 3 through 30...

        case 31: return a / ((1 << 31) + 1);
    }
}

So now each division is by a constant, which compilers will typically reduce to a series of multiply/shift/add instructions (as the question mentioned). See Does a c/c++ compiler optimize constant divisions by power-of-two value into shifts? for deatils.

回复收藏 0 原文

抚笙 2024-10-05 23:09:15

您可以用常量替换整数除法，也可以用幻数和移位来替换乘法（模字长）。

可以针对已知常数预先计算幻数。

由于 n 不能采用多个值，例如 0..31，因此可以“轻松”地预先计算所有 n 的这些幻数并将其存储在包含 32 个元素的表中。

用于计算幻数的Javascript页面

一个好的编译器可以计算幻数并用乘法代替整数除法如果除数在编译时是常数，则进行移位。根据其余代码如何围绕性能关键代码构建，您可以使用宏或内联技巧来展开 n 的所有可能值，并让编译器完成查找幻数的工作（类似于答案使用开关，但我会在常量区域中放置更多代码，否则这可能是一种不值得的权衡——分支也会降低性能）

详细说明以及用于计算幻数的代码可以在Henry S. Warren, Jr. 所著的《Hackers Delight》一书（强烈推荐必读之书！）第 180 页。