计算模 25 的高效(循环明智)算法?

发布于 2024-07-24 06:50:53 字数 587 浏览 5 评论 0原文

我有一个代码,其中计算 x % 25。x 始终取正值,但其动态范围很大。

我发现这个计算 ax % 25 的特定代码片段占用了很大的周期。 我需要优化它。

由于表的内存大小可能较大,因此排除了预先计算的查找表。

作为第二种方法,我在下面编写了一个片段(C 代码)-

mod(a, b)
{   
    int r = a;  
    while(r >= b)
    {      
        r = r - b;
    }   
    return r;
}

1.)我如何进一步优化此代码的循环(将其压缩到最大值)?

2.)是否有任何完全不同的优化方法来实现x%25(我知道这不是一个常见的操作,但仍然寻找人们可能在他们的经验中使用过的聪明的输入,这可能会帮助我。)。

谢谢。

-AD

编辑:

我认为在 C 中使用本机模运算符 % ,内部使用除法运算 (/),这在我使用的处理器上成本很高。(没有 div 指令)。 因此尝试看看自定义实现是否可以击败使用 % 运算符的固有计算。

-广告

I have a code in which i am computing x % 25. x always takes a positive value but its dynamic range is large.

I found out that this particular code piece of computing a x % 25 is taking large cycles. I need to optimize it.

Pre-computed lookup table is ruled out due to the possible large memory size of the table.

As second approach i coded a fragment below(C code) -

mod(a, b)
{   
    int r = a;  
    while(r >= b)
    {      
        r = r - b;
    }   
    return r;
}

1.) How can i optimize this code further for cycles(squeeze it to max)?

2.) Is there any entirely different optimized way to achieve x % 25( i know its not a common operation, but still, looking for clever inputs people might have used in their experience which might nelp me.).

Thank you.

-AD

EDIT:

I think using a native modulo operator % in C , internally uses a division operation (/) which is costly on the processor i am using.(No div instruction). hence trying to see if custom implemetation can beat the inherent computation using % operator.

-AD

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(22

北斗星光 2024-07-31 06:50:53

我建议阅读黑客之乐。 它描述了非常快速的常除数余数算法。 他们几乎肯定会击败通用算法。

更新:这里是一些示例代码...它可能可以被修改以避免临时的 long long 。

unsigned mod25(unsigned n)
{
    unsigned reciprocal = 1374389535; // 2^35 / 25
    unsigned div25 = ((unsigned long long)n * reciprocal) >> 35;
    return n - div25 * 25;
}

I suggest reading Hacker's Delight. It describes very fast remainder algorithms for constant divisors. They would almost certainly beat a general algorithm.

Update: Here is some example code... It can probably be reworked to avoid the temporary long long.

unsigned mod25(unsigned n)
{
    unsigned reciprocal = 1374389535; // 2^35 / 25
    unsigned div25 = ((unsigned long long)n * reciprocal) >> 35;
    return n - div25 * 25;
}
翻身的咸鱼 2024-07-31 06:50:53

我受到 Pax 答案的启发,提出了一个更通用的算法。

int mod(int a, int b) {
    int s = b;
    while (s <= a) {
        s <<= 1;
    }
    int r = a;
    while (r >= b) {
        s >>= 1;
        if (s <= r) {    
            r -= s;
        }
    }
    return r;
}

这会从 a 中减去 b 的两倍的幂,直到找到结果。

编辑:添加了 if 条件以使其正常工作。

例如,如果执行 100 % 7,则首先计算出 7 * 2 * 2 * 2 * 2 = 112。然后将 112 (s) 除以 2,然后从 100 中减去所得结果(r)(当 s <= r 时)并不断执行此操作,直到找到模数。 因此

s = 112 / 2 = 56, r = 100 - 56 = 44
s = 56 / 2 = 28, r = 44 - 28 = 16
s = 28 / 2 = 14, r = 16 - 14 = 2

,100%7=2

I was inspired by Pax's answer and made a more general purpose algorithm.

int mod(int a, int b) {
    int s = b;
    while (s <= a) {
        s <<= 1;
    }
    int r = a;
    while (r >= b) {
        s >>= 1;
        if (s <= r) {    
            r -= s;
        }
    }
    return r;
}

This subtracts power of two multiples of b from a until the result is found.

EDIT: added the if condition to make it work properly.

As an example, if this is doing 100 % 7, it first works out that 7 * 2 * 2 * 2 * 2 = 112. Then it divides 112 (s) by 2 and subtracts that from 100 (r) (when s <= r) and continually does this until the modulo is found. Therefore,

s = 112 / 2 = 56, r = 100 - 56 = 44
s = 56 / 2 = 28, r = 44 - 28 = 16
s = 28 / 2 = 14, r = 16 - 14 = 2

therefore, 100 % 7 = 2

╰ゝ天使的微笑 2024-07-31 06:50:53

这是我想出的另一个解决方案:

int mod25(int x){
  /* 25 * (all powers of 2 <= INT_MAX), descending */
  if (x >= 1677721600) x -= 1677721600;
  if (x >=  838860800) x -=  838860800;
  if (x >=  419430400) x -=  419430400;
  if (x >=  209715200) x -=  209715200;
  if (x >=  104857600) x -=  104857600;
  if (x >=   52428800) x -=   52428800;
  if (x >=   26214400) x -=   26214400;
  if (x >=   13107200) x -=   13107200;
  if (x >=    6553600) x -=    6553600;
  if (x >=    3276800) x -=    3276800;
  if (x >=    1638400) x -=    1638400;
  if (x >=     819200) x -=     819200;
  if (x >=     409600) x -=     409600;
  if (x >=     204800) x -=     204800;
  if (x >=     102400) x -=     102400;
  if (x >=      51200) x -=      51200;
  if (x >=      25600) x -=      25600;
  if (x >=      12800) x -=      12800;
  if (x >=       6400) x -=       6400;
  if (x >=       3200) x -=       3200;
  if (x >=       1600) x -=       1600;
  if (x >=        800) x -=        800;
  if (x >=        400) x -=        400;
  if (x >=        200) x -=        200;
  if (x >=        100) x -=        100;
  if (x >=         50) x -=         50;
  if (x >=         25) x -=         25;
  return x;
}

它不使用除法或乘法,只使用 27 次比较和最多 27 次减法。

要让自己相信这有效有点困难,但它确实有效(至少对于 x 的非负值而言)。

上面的代码实际上是此代码的展开版本:

int mod25(int x){
  int divisor;
  for(int divisor = 1677721600; divisor >= 25; divisor >>= 1) {
    if (x >= divisor) x -= divisor;
  }
  return x;
}

通过展开它,我们可以避免进行循环比较以及以较大代码为代价的移位。 如果您愿意的话,您甚至可以使用 Duff 的设备部分展开它,但总共只有 27 次迭代,并且每次迭代的代码很少,我倾向于将其全部展开。

它的工作原理如下:每个非负整数 x 都可以表示为 (n * 25) + k,其中 n 是非负整数,k 是 0 到 24 之间的整数。 k 也恰好是我们想要的结果,所以如果我们可以计算 x - (n * 25) 我们就会得到答案。 不过,我们希望能够在不预先知道 n 的情况下做到这一点。

考虑二进制中的 n。 如果我们可以关闭每个 1 位,我们就会得到 0。一种方法是从 2 的大幂开始,然后向下计算,仅当 n 的当前值大于或等于 2 的幂。

由于我们正在处理 (n * 25),因此我们实际上需要 2 乘以 25 的降幂。由于 k 严格小于 25,并且我们考虑的最小除数是 25,所以这甚至可以工作当我们处理 (n * 25) + k 时。

因此,每次比较+减法都会将 n 的一位归零,最后我们留下 k,即余数。

Here's another solution I came up with:

int mod25(int x){
  /* 25 * (all powers of 2 <= INT_MAX), descending */
  if (x >= 1677721600) x -= 1677721600;
  if (x >=  838860800) x -=  838860800;
  if (x >=  419430400) x -=  419430400;
  if (x >=  209715200) x -=  209715200;
  if (x >=  104857600) x -=  104857600;
  if (x >=   52428800) x -=   52428800;
  if (x >=   26214400) x -=   26214400;
  if (x >=   13107200) x -=   13107200;
  if (x >=    6553600) x -=    6553600;
  if (x >=    3276800) x -=    3276800;
  if (x >=    1638400) x -=    1638400;
  if (x >=     819200) x -=     819200;
  if (x >=     409600) x -=     409600;
  if (x >=     204800) x -=     204800;
  if (x >=     102400) x -=     102400;
  if (x >=      51200) x -=      51200;
  if (x >=      25600) x -=      25600;
  if (x >=      12800) x -=      12800;
  if (x >=       6400) x -=       6400;
  if (x >=       3200) x -=       3200;
  if (x >=       1600) x -=       1600;
  if (x >=        800) x -=        800;
  if (x >=        400) x -=        400;
  if (x >=        200) x -=        200;
  if (x >=        100) x -=        100;
  if (x >=         50) x -=         50;
  if (x >=         25) x -=         25;
  return x;
}

This doesn't use divides or multiplys, just 27 comparisons and a maximum of 27 subtractions.

It's a little hard to convince yourself that this works, but it does (at least for non-negative values of x).

The above code is really an unrolled version of this:

int mod25(int x){
  int divisor;
  for(int divisor = 1677721600; divisor >= 25; divisor >>= 1) {
    if (x >= divisor) x -= divisor;
  }
  return x;
}

By unrolling it we avoid doing the loop comparison and also the shifts at the expense of larger code. You could even partially unroll it using Duff's device if you felt so inclined, but with only 27 iterations total, and such a tiny bit of code per-iteration, I'd be inclined to just unroll it all the way.

Here's how it works: Every non-negative integer x can be expressed as (n * 25) + k where n is a non-negative integer and k is an integer from 0 to 24. k also happens to be the result we want, so if we could compute x - (n * 25) we'd get our answer. We want to be able to do this without knowing n up-front, though.

Think about n in binary. If we could turn off each of the 1 bits we'd get 0. One way to do this is to start at large powers of 2 and work our way down, subtracting each power of 2 only if the current value of n is greater than or equal to that power of 2.

Since we're dealing with (n * 25) we actually need descending powers of 2 times 25. Since k is strictly less than 25, and the smallest divisor we ever consider is 25, this works even when we're dealing with (n * 25) + k.

So each comparison + subtraction is zeroing out one bit of n, and at the end we're left with k, the remainder.

-黛色若梦 2024-07-31 06:50:53

由于您希望模数为常数,因此您可以通过使用倒数乘法来击败它。 本文展示了如何以这种方式除以常数,最后,如何从中获得余数。

Since you want the modulus by a constant, you can probably beat it by using reciprocal multiplication. This paper shows how you can divide by a constant in such a manner, and towards the end, how to get the remainder from it.

失眠症患者 2024-07-31 06:50:53

这是我能想到的最好的结果:

int mod25(int x)
{
    while((x = (x & 31) + 7 * (x >> 5)) >= 25)
        x -= 25;

    return x;
}

它近似于 x % 25x % 32 + 7 * (x/32)。 该值将超出 25 的倍数,从而允许递归。

性能似乎足够:x = 2147483647(又名INT_MAX)值需要 11 次迭代。

Here's the best I could come up with:

int mod25(int x)
{
    while((x = (x & 31) + 7 * (x >> 5)) >= 25)
        x -= 25;

    return x;
}

It approximates x % 25 with x % 32 + 7 * (x/32). The value will overshoot by a multiple of 25, which allows for recursion.

Performance seems to be adequate: A value of x = 2147483647 (aka INT_MAX) needs 11 iterations.

扬花落满肩 2024-07-31 06:50:53

噢,我的<选择的神>。 我简直不敢相信其中一些答案。

首先,重复减法,即使是帕克斯的版本,也永远不会是最佳的。 考虑以下情况:

20 % 25

使用重复减法可以轻松快速,但是:

65535 % 25

会非常慢,需要 600 多次迭代。 这是 16 位数字的平均 300 次迭代。 至于32位数字,好吧,就不要去那里了。

最快的方法是使用长除法。 请参阅尼基的回答。

但是,这就是编译器无论如何都会生成的内容,至少,人们希望这是编译器正在生成的内容。 最好检查一下您是否正在使用适用于特定处理器的编译器。

加快速度的最佳方法是首先不进行模数计算。 为什么需要获取模数,您能否重构代码/算法来避免模数,或者至少使模数变得微不足道。

Oh my <deity of choice>. I can't believe some of these answers.

First thing, repeated subtraction, even Pax's version, will never, ever be optimal. Consider, the following:

20 % 25

that's easy and fast using repeated subtraction, but:

65535 % 25

will be horribly slow, 600+ iterations. That's an average of 300 iterations for 16 bit numbers. As for 32 bit number, well, just don't even go there.

The fastest way to do this is to use long division. See Niki's answer.

But, this is what the compiler will be generating anyway, at least, one would hope it is what the compiler is generating. It's always best to check if you're using a compiler for a niche processor.

The best way to speed this up is to not do the modulus in the first place. Why do you need to get the modulus and can you re-factor the code / algorithm to avoid the modulus, or at least, make the modulus trivial.

紧拥背影 2024-07-31 06:50:53

你的循环的问题是它是 O(n) - 对于大的 r 值来说它会非常慢。 我建议这样:

for (int s = MAX_SHIFT; s>=0; s--)
  if (r > (b<<s)) r -= (b<<s);

但我怀疑你的编译器正在做比这更昂贵的事情。

The problem with your loop is that it's O(n) - it'll be very slow for large values of r. I'd suggest something like this:

for (int s = MAX_SHIFT; s>=0; s--)
  if (r > (b<<s)) r -= (b<<s);

But I doubt that your compiler is doing anything much more expensive than that.

把回忆走一遍 2024-07-31 06:50:53

在许多处理器上,整数乘法比整数除法更快。 这篇博文展示了如何替换常量整数除法与常数整数乘法。 通过稍微重新排列数学,您可以获得余数而不是商。 但请注意,如果您使用的是中等复杂的编译器,那么这已经为您完成了。 您只需编写 x % 25 ,编译器就会计算出其余部分。 在用 C 进行此优化之前,您应该检查代码生成的汇编代码,验证编译器尚未执行此操作。此外,您应该测量(分析)前后的性能,以确保您确实使事情变得更快。

对于相当大的操作数,循环将比使用本机指令进行除法慢得多。

编辑:另请参阅本文

On many processors, integer multiplication is faster than integer division. This blog post shows how to replace a constant integer division with a constant integer multiplication. By rearranging the maths a bit you can get the remainder instead of the quotient. Note, however, that if you are using a moderately sophisticated compiler, then this is already done for you. You just write x % 25 and the compiler works out the rest. You should check the generated assembly code for your code, verifying that the compiler has not done this already, before doing this optimisation in C. Also, you should measure (profile) the performance before and after to ensure that you really are making things faster.

Looping will be far slower than doing the division using the native instruction for reasonably large operands.

Edit: see also this paper.

独闯女儿国 2024-07-31 06:50:53

如果您的 C 编译器针对的是不带除法指令的 CPU,则可以按如下方式修改代码:

mod(a, b) {
    int s = b + b + b + b;
    int r = a;
    while(r >= s) {
        r -= s;
    }
    while(r >= b) {
        r -= b;
    }
    return r;
}

这通过以 4 块而不是 1 为单位减去值来实现,直到最后一个,然后切换为以 1 为单位减去。

这将使您的代码运行速度提高大约四倍(假设 4*b 不在整数范围之外)。 您甚至可以在 4*b 循环之前插入更多循环(例如 8*b 循环),以获得更快的速度。

除此之外,手动编码汇编器可能会有所帮助,但我认为如果没有它,您会发现上面的代码有很大的提升。

如果您了解有关使用 mod 调用的方式的更多详细信息,您可以针对您的特定情况对其进行优化。 例如,如果您只想知道 16 位整数的模 25,则以下代码将比具有可变分母的简单循环快得多。

int mod25 (int a) {                // a has maximum value of 2^15-1 = 32767
    while (a >= 15625) a-= 15625;  // at most 2 times.
    while (a >= 625) a-= 625;      // at most 24 times.
    while (a >= 25) a-= 25;        // at most 24 times.
    return a;
}

运行测试时,我发现必须进行 1000 万次迭代,模代码和使用 % 运算符之间才会出现明显差异(2 秒与 0 秒)。 直到那时,它们都是 0 秒,尽管这是在快速机器上运行(对于 mod25 更好)并且带有 div 指令(对于 % 运算符更好),因此您需要在自己的硬件上对其进行基准测试。

这大约是您在不使代码不可读的情况下可能达到的最快速度(尽管如果您愿意添加大量注释来解释其工作原理,即使这样也不会阻止您)。

对于任何分母,更通用的解决方案是首先尽可能将分母加倍(通过位移位来提高速度),以便最小化随后的减法。 然后,当分子减少到增加的分母以下时,将分母减半并继续(直到分母回到开始位置)。

int mod (int n, int d) {
    /* dx is the adjusted denom, don't let it overflow though. */
    int dx = d;
    while (((dx << 1) >>1) == dx)
        dx <<= 1;

    /* This loop processes the dx values until they get too small. */
    while (dx >= d) {
        /* This loop subtracts the large dx value. */
        while (n >= dx)
            n -= dx;
        dx >>= 1;
    }
    return n;
}

这实际上与上面 mod25 的优化版本性能相当,同时提供了更通用的解决方案。

If your C compiler is targeting a CPU with no divide instruction, you can modify your code as follows:

mod(a, b) {
    int s = b + b + b + b;
    int r = a;
    while(r >= s) {
        r -= s;
    }
    while(r >= b) {
        r -= b;
    }
    return r;
}

This works by subtracting the values in chunks of four rather than one, right up until the last one then it switches to subtracting chunks of one.

This should make your code run about four times as fast (assuming 4*b isn't outside the range of your integers). You could even insert more loops (say an 8*b one) before the 4*b one for even more speed.

Other than that, hand-coding assembler may help but I think you'll find quite a boost from the above code without it.

If you know more detail on the way you'll be using the mod call, you can optimize it for your particular cases. For example, if you only want to know modulo 25 of a 16-bit integer, the following code will be much faster than a simplistic loop with variable denominator.

int mod25 (int a) {                // a has maximum value of 2^15-1 = 32767
    while (a >= 15625) a-= 15625;  // at most 2 times.
    while (a >= 625) a-= 625;      // at most 24 times.
    while (a >= 25) a-= 25;        // at most 24 times.
    return a;
}

Running a test, I find that you have to do 10 million iterations before a noticeable difference appears between that modulo code and the use of the % operator (2 seconds vs. 0 seconds). Up until that point, they were both 0 seconds, although that was run on a fast machine (better for mod25) and with a div instruction (better for % operator) so you'd need to benchmark it on your own hardware.

This is about as fast as you're likely to get without making your code unreadable (although even that shouldn't stop you if you're willing to add lots of comments explaining how it works).

A more general solution for any denominator is to first double the denominator (with bit shifts for speed) as far as possible so that the ensuing subtractions are minimized. Then, as the numerator reduces below the increased denominator, halve the denominator and keep going (until the denominator is back at the start).

int mod (int n, int d) {
    /* dx is the adjusted denom, don't let it overflow though. */
    int dx = d;
    while (((dx << 1) >>1) == dx)
        dx <<= 1;

    /* This loop processes the dx values until they get too small. */
    while (dx >= d) {
        /* This loop subtracts the large dx value. */
        while (n >= dx)
            n -= dx;
        dx >>= 1;
    }
    return n;
}

This actually performs on par with the optimized version of mod25 above while providing a more general solution.

吻安 2024-07-31 06:50:53

请运用一些常识。

如果您编写的 C 代码计算速度比编译器快 x % 25,那么编译器将使用该更快的方法。

最初的发布者做了一个奇妙的假设,即编译器将使用除法。 我在过去十年中使用过的编译器都不会这样做。 它是乘以一个接近 (2^32 / 25) 的常数,再加上一些你无法手动改进的小改动。

有一种远程可能性,您可以生成比编译器更快的代码来确定 x % 25 == 0 是否,因为您实际上不需要正确计算 x % 25 的代码,只需要正确计算 x % 25 的代码,如果它是 0 并且如果 x % 25 != 0 则不会产生 0。节省的时间可能是亚纳秒。

“如何针对各种常数 c 最佳地计算 x % c”是一个很好的难题。 编译器编写者喜欢漂亮的谜题。 他们比你更擅长解决这样的难题。 特别是因为他们只需要一个适用于一台机器的解决方案,而您必须在其中生成通用解决方案。

please engage some common sense.

If you could write C code that calculated x % 25 faster than the compiler can, then the compiler would use that faster method.

The original poster made this fantastic assumption that the compiler would use a division. No compiler that I've used in the last ten years would be doing that. It's a multiplication by a constant close to (2^32 / 25) plus some bit twiddling that you won't be able to improve by hand.

There is a remote possibility that you can produce faster code than the compiler to find out whether x % 25 == 0, because you don't actually need code that will calculate x % 25 correctly, only code that calculates x % 25 correctly if it is 0 and doesn't produce a 0 if x % 25 != 0. Savings will probably be sub-nanosecond.

"How do I calculate x % c optimally for various constants c" is a nice puzzle. Compiler writers like nice puzzles. And they are better at solving nice puzzles like this than you are. Especially since they only need a solution that works for one machine where you would have to produce a general solution.

給妳壹絲溫柔 2024-07-31 06:50:53

如果您不喜欢 % 运算符:

int mod(int a, int b) {
    int integral = a / b;
    return a - (b*integral);
}

If you don't like % operator:

int mod(int a, int b) {
    int integral = a / b;
    return a - (b*integral);
}
你是我的挚爱i 2024-07-31 06:50:53

如果您知道 b 将是 2 的幂,则可以使用按位 AND 而不是模运算符。 然而,模数维基百科页面似乎表明任何 C 编译器都会注意到这一点并优化掉无论如何,模数。

If you know that b will be a power of 2, you could use bitwise AND instead of the modulo operator. However, the wikipedia page for modulo seems to indicate that any C compiler would notice this and optimize out the modulo anyway.

情痴 2024-07-31 06:50:53

可能不是最快的,但相当有效。 我没有时间测试,但使用(2 的幂)* 25 到最大范围/2 的查找表。 然后做一个循环。 例如,范围达到 3199 需要 7 次迭代。

static int pow[] = {25, 50, 100, 200, 400, 800, 1600};

int mod25(int x)
{    
    int i = sizeof pow /sizeof pow[0];

    while (i--)
    {
        if (x >= pow[i])
            x -= pow[i];    
    }    
    return x;
}

如果范围非常大,但低值更常见,那么可能值得使用二元斩波来查找起点。

Possibly not the fastest but reasonably efficient. I haven't got time to test, but use a look up table of (powers of 2) * 25 up to the maximum range/2. Then do a loop. E.g. range up to 3199 needs 7 iterations.

static int pow[] = {25, 50, 100, 200, 400, 800, 1600};

int mod25(int x)
{    
    int i = sizeof pow /sizeof pow[0];

    while (i--)
    {
        if (x >= pow[i])
            x -= pow[i];    
    }    
    return x;
}

If you have a very large range but low values are more common then it might be worthwhile usng a binary chop to find the starting point.

软甜啾 2024-07-31 06:50:53
int mod25(int x) {
  static int divisors[] = {2147483625, 244140625, 9765625, 390625, 15625, 625, 25};
  int i;
  for (i = 0; i < sizeof(divisors)/sizeof(int); i++) {
    int divisor = divisors[i];
    while (x >= divisor) {
      x -= divisor;
    }
  }
  return x;
}

工作原理:我们希望将 x 递减 25 的大倍数,以尽快减小该值。 当除数太大时,我们会切换到 25 的较小倍数。如果除数已经降至 25,那么我们就完成了。

您可以尝试尝试不同的除数。 你只想确保:

  • 它们是降序的
  • 它们都是 25 的倍数
  • 最后一个值是 25

在上面的代码中,我使用了 25 的最大有符号 32 位倍数加上 25 的幂,这看起来很合理,尽管我不得不承认我不确定它是否是最佳的。

(顺便说一句:如果你的编译器不进行常量折叠——这将是非常令人惊讶的——那么你可能想要用硬性替换i的上限-编码常量。)

int mod25(int x) {
  static int divisors[] = {2147483625, 244140625, 9765625, 390625, 15625, 625, 25};
  int i;
  for (i = 0; i < sizeof(divisors)/sizeof(int); i++) {
    int divisor = divisors[i];
    while (x >= divisor) {
      x -= divisor;
    }
  }
  return x;
}

How it works: We want to decrement x by large multiples of 25 to reduce the value as fast as possible. When the divisor is too big we switch to a smaller multiple of 25. If the divisor is already down to 25 then we're done.

You could try experimenting with different divisors. You just want to make sure that:

  • they're descending
  • they're all multiples of 25
  • the last value is 25

In the code above I used the largest signed-32-bit multiple of 25 plus the powers of 25, which seems reasonable, though I have to admit that I'm not sure that it's optimal.

(BTW: if your compiler doesn't do constant folding -- which would be very surprising -- then you might want to replace the upper-limit of i with a hard-coded constant.)

始终不够爱げ你 2024-07-31 06:50:53

为什么不能只使用运算符%? 如果这是 C 代码,并且数字是普通的“本地”int:s,那么这应该是迄今为止最快的方法。

Why can't you just use the operator %? If this is C code, and the numbers are ordinary "native" int:s, then that should be the fastest way, by far.

断桥再见 2024-07-31 06:50:53

有没有理由不能使用 C 内置的模运算符?

int a = x % 25;

根据您的编辑;

如果您的 rpocessor 没有内置模支持,那么我仍然会使用 % 运算符,原因很简单,您的编译器会知道相关处理器没有本机 % 函数,并且可能会生成 asm 代码来最佳地模拟它。

这么说吧 - 如果你能想出一种通用算法,它的性能优于编译器使用内置运算符生成的算法,而不考虑特定情况(例如简单地取模 100 等的 2 个最低数字等),我会很着迷

Is there a reason why you cant use C's built in modulus operator?

int a = x % 25;

Following your edit;

If your rpocessor does not have built in modulo support then I would still use the % operator for the simple reason that your compiler will know that the processor in question doesnt have a native % function, and will likely produce asm code to optimally emulate it.

Put it this way - I'd be fascinated if you can come up with a genarl algorithm that outperforms whatevr the compiler produces from using the built in operator, notwithsatanding specific cases (such as simply taking the 2 lowest digits for modulo 100 etc)

青春如此纠结 2024-07-31 06:50:53

怎么样:

int y = 0, x = (x & 0x7f); 
while (x > 25) { x -= 25; y++; }

更新:这是非常错误的:)但想法是存在的。

How about:

int y = 0, x = (x & 0x7f); 
while (x > 25) { x -= 25; y++; }

Update: it's pretty wrong :) But the idea is there.

腻橙味 2024-07-31 06:50:53

我觉得很奇怪,操作 x % 25 需要这么长的时间(如果您使用内置的 % 运算符)。 大多数现代处理器应该在一条指令中完成此操作。 我会寻找此代码花费如此长时间的其他原因。

编辑:
这是一个至少可以提供一些想法的算法:

256 = 6 (mod 25)

这意味着如果我们将数字 x 写为字节 x3 x2 x1 x0 我们有x = 6^3*x3 + 6^2*x2 + 6*x1 + x0 (mod 25)

这给出了减小 x 大小的算法:(

int x0 = x & 0xFF, x1 = (x>>8) & 0xFF, x2 = (x>>16) & 0xFF, x3 = (x>>24) & 0xFF;

int y = x4;
y = (y << 2) + (y << 1) + x3;
y = (y << 2) + (y << 1) + x2;
y = (y << 2) + (y << 1) + x1;
y = (y << 2) + (y << 1) + x0;

此处(y << 2) + (y << 1) = 4*y + 2*y = 6*y)

在此之后 y 将得到余数与 x mod 25 相同。
迭代 1、2 或 3 次将使 y 分别成为 17、11 或 9 位数字。 其中一个尺寸可能小到足以制作一个查找表。

不过,我严重怀疑这会比内置的 % 运算符更快。

I find it pretty strange that the operation x % 25 takes such long time (if you are using the built-in % operator, that is). Most modern processors should do this in a single instruction. I'd look for other reasons that this code takes so long.

EDIT:
Here's an algorithm that might at least give some ideas:

256 = 6 (mod 25)

This means that if we write a number x as bytes x3 x2 x1 x0 we have that x = 6^3*x3 + 6^2*x2 + 6*x1 + x0 (mod 25)

This gives an algorithm for reducing the size of x:

int x0 = x & 0xFF, x1 = (x>>8) & 0xFF, x2 = (x>>16) & 0xFF, x3 = (x>>24) & 0xFF;

int y = x4;
y = (y << 2) + (y << 1) + x3;
y = (y << 2) + (y << 1) + x2;
y = (y << 2) + (y << 1) + x1;
y = (y << 2) + (y << 1) + x0;

(here (y << 2) + (y << 1) = 4*y + 2*y = 6*y)

After this y will have the same remainder as x mod 25.
Iterating this 1, 2 or 3 times will make y a 17, 11, or 9 bit number, respectively. One of these sizes might be small enough to make a lookup table of.

I SERIOUSLY doubt that this would be faster than the builtin % operator, though.

挽梦忆笙歌 2024-07-31 06:50:53

如果您将数字保存为 BCD 或数字字节数组,这将非常容易。 不幸的是,我不知道你的程序中还用这些数字做什么。 有时,看看如何表示数据而不是仅仅研究算法是值得的。

If you kept your numbers in BCD or a byte array of digits, this would be pretty easy. Unfortunately, I have no idea what else you're doing in your program with these numbers. Sometimes it pays to look at how you represent your data rather than just bang away at algorithms.

白芷 2024-07-31 06:50:53

这是一个想法

static int table0[256];
static int table1[256];
static int table2[256];
static int table3[256];

// ran just once to initialize the tables
void initialMod25Tables() {
    for (int i = 0; i < 256; ++i) {
        table0[i] = i % 25;
    }
    for (int i = 0; i < 256; ++i) {
        table1[i] = (i << 8) % 25;
    }
    for (int i = 0; i < 256; ++i) {
        table2[i] = (i << 16) % 25;
    }
    for (int i = 0; i < 256; ++i) {
        table3[i] = (i << 24) % 25;
    }
}

int mod25(int x) {
    int y = table0[x & 0xFF];
    x >>= 8;
    y += table1[x & 0xFF];
    x >>= 8;
    y += table2[x & 0xFF];
    x >>= 8;
    y += table3[x & 0xFF];
    y = table0[y];
    return y;
}

Heres an Idea

static int table0[256];
static int table1[256];
static int table2[256];
static int table3[256];

// ran just once to initialize the tables
void initialMod25Tables() {
    for (int i = 0; i < 256; ++i) {
        table0[i] = i % 25;
    }
    for (int i = 0; i < 256; ++i) {
        table1[i] = (i << 8) % 25;
    }
    for (int i = 0; i < 256; ++i) {
        table2[i] = (i << 16) % 25;
    }
    for (int i = 0; i < 256; ++i) {
        table3[i] = (i << 24) % 25;
    }
}

int mod25(int x) {
    int y = table0[x & 0xFF];
    x >>= 8;
    y += table1[x & 0xFF];
    x >>= 8;
    y += table2[x & 0xFF];
    x >>= 8;
    y += table3[x & 0xFF];
    y = table0[y];
    return y;
}
最美的太阳 2024-07-31 06:50:53

在使用 David Johnstone 关于 Pax 算法的答案后修改了通用算法。 这大大减少了循环周期,应该可以解决 Skizz 的担忧。

unsigned mod(unsigned a, unsigned b) {
    if (a < b) return a;
    unsigned s = b, ret = a;
    while(ret >= b){
        while(s <= ret){
            s <<= 3;
        }
        while (s > ret && s > b) {
             s >>= 3;
        }
        if(s < b) s = b;
        while (ret >= s){
            ret -= s;
        }
    }
    return ret;
}

我已运行 mod(536870910, 25) 作为测试用例。 理论上,该函数可以毫无问题地处理的最大数字 a 将为 UINT_MAX <<= 3 或大约 536870910(如果 int 为 32)少量。

int mod =  mod(536870910, 25) // mod will be 10

该函数有四个 while() 循环。 为了测试效率,我在每个循环上放置了计数器。 在 mod(536870910, 25) 的情况下,while 循环计数器的总数分别为 8、9、9 和 26。 如果使用直接减法来计算 536870910 % 25,则需要循环超过 21,000,000 次。

那么为什么要尝试确定一种算法来执行 % 运算符已经执行的操作呢? 就我而言,我使用类似的函数对自定义类型的非常大的数字执行 mod() ,因此我需要自己的算法来重载 % > 运算符来处理我的类型。 因此,就我而言,mod() 函数使用特殊类型而不是无符号整数。

就其价值而言,上述函数中的 <<= 3>>=3 可以改为 <<=1< /code> 和 >>=1。 当我测试时,较大的移位似乎可以减少循环周期。 重要的是来回使用相同的移位量。

A revised general algorithm after working with David Johnstone's answer regarding Pax's algorithm. This considerably reduces loop cycles and should address Skizz's concerns.

unsigned mod(unsigned a, unsigned b) {
    if (a < b) return a;
    unsigned s = b, ret = a;
    while(ret >= b){
        while(s <= ret){
            s <<= 3;
        }
        while (s > ret && s > b) {
             s >>= 3;
        }
        if(s < b) s = b;
        while (ret >= s){
            ret -= s;
        }
    }
    return ret;
}

I've run mod(536870910, 25) as a test case. In theory, the maximum number a this function can handle without problem will be UINT_MAX <<= 3 or about 536870910 if int is 32 bit.

int mod =  mod(536870910, 25) // mod will be 10

The function has four while() loops. In order test the efficiency, I put counters on each loop. In the case of mod(536870910, 25) the while loop counters totaled 8, 9, 9, and 26 respectively. If using straight subtraction to calculate 536870910 % 25, you would need to loop over 21,000,000 times.

So why try to determine an algorithm that does what the % operator already does? In my case I'm using a similar function to do mod() on very large numbers that are a custom type, so I needed my own algorithm in order to overload the % operator to work with my type. So in my case, the mod() function uses special types and not unsigned ints.

For what it's worth, the <<= 3 and >>=3 in the function above can instead be <<=1 and >>=1. When i was testing, a larger shift seemed to reduce the loop cycles. The important thing is to use the shift back and forth the same amount.

哀由 2024-07-31 06:50:53

如果您只考虑数字 25,则可以使用以下事实:当且仅当整数的最后两位数字是 00、25、50 或 75 时,25 才能整除该整数。因此,要获得模数,您需要考虑最后两位数字和然后减去最接近的 00、25、50 或 75。

If you are only considering the number 25 you can use the fact that 25 divies an integer if and only if the last two digits of the integer are 00, 25, 50 or 75. So to get the modulo you consider the last two digits and then subtract the nearest of 00, 25, 50 or 75.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文