如何优化我的 C/x86 代码?

发布于 2024-08-20 20:30:33 字数 557 浏览 14 评论 0原文

int lcm_old(int a, int b) {
    int n;
    for(n=1;;n++)
        if(n%a == 0 && n%b == 0)
            return n;  
}

int lcm(int a,int b)  {
    int n = 0;
    __asm {
lstart:
        inc n;
        mov eax, n;
        mov edx, 0;
        idiv a;
        mov eax, 0;
        cmp eax, edx;
        jne lstart;
        mov eax, n;
        mov edx, 0;
        idiv b;
        mov eax, 0;
        cmp eax, edx;
        jnz lstart;
    }
    return n;
}

我试图用我自己的函数(底部)击败/匹配顶部函数的代码。您对如何优化我的日常工作有什么想法吗?

PS。这只是为了好玩。

int lcm_old(int a, int b) {
    int n;
    for(n=1;;n++)
        if(n%a == 0 && n%b == 0)
            return n;  
}

int lcm(int a,int b)  {
    int n = 0;
    __asm {
lstart:
        inc n;
        mov eax, n;
        mov edx, 0;
        idiv a;
        mov eax, 0;
        cmp eax, edx;
        jne lstart;
        mov eax, n;
        mov edx, 0;
        idiv b;
        mov eax, 0;
        cmp eax, edx;
        jnz lstart;
    }
    return n;
}

I'm trying to beat/match the code for the top function with my own function (bottom). Have you got any ideas how I can optimize my routine?

PS. This is just for fun.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

猫性小仙女 2024-08-27 20:30:33

我会使用不同的算法进行优化。像你一样线性搜索是非常慢的。事实上,两个自然数的最小公倍数是它们的乘积除以它们的最大公约数的商。您可以使用欧几里得算法快速计算最大公约数。

因此:

int lcm(int a, int b) {
    int p = a * b;
    return p / gcd(a, b);
}

您需要在其中实现gcd(int, int)。由于欧几里得算法的平均步骤数为O(log n),我们轻松击败了简单的线性搜索。

对于这个问题还有其他方法。如果您有一种可以快速分解整数的算法(例如量子计算机),那么您也可以像这样解决这个问题。如果将 ab 分别写入其规范素数分解,

a = p_a0^e_a0 * p_a1^e_a1 * ... * p_am^e_am
b = p_b0^e_b0 * p_b1^e_b1 * ... * p_bn^e_bn

ab 的最小公倍数是通过对 ab 至少其中一个分解中出现的每个质因数取它在 a 分解中出现的最大指数来获得的代码>a或b。例如:

28 = 2^2 * 7
312 = 2^3 * 39

所有这些都是为了

lcm(28, 312) = 2^3 * 7 * 39 = 2184

指出,朴素的方法因其简单性而令人钦佩,但您可以花一整天的时间来优化它们的每一纳秒,但仍然无法击败高级算法。

I would optimize by using a different algorithm. Searching linearly like you are doing is super-slow. It's a fact that the least common mulitple of two natural numbers is the quotient of their product divided by their greatest common divisor. You can compute the greatest common divisor quickly using the Euclidean algorithm.

Thus:

int lcm(int a, int b) {
    int p = a * b;
    return p / gcd(a, b);
}

where you need to implement gcd(int, int). As the average number of steps in the Euclidean algorithm is O(log n), we beat the naive linear search hands down.

There are other approaches to this problem. If you had an algorithm that could quickly factor integers (say a quantum computer) then you can also solve this problem like so. If you write each of a and b into its canonical prime factorization

a = p_a0^e_a0 * p_a1^e_a1 * ... * p_am^e_am
b = p_b0^e_b0 * p_b1^e_b1 * ... * p_bn^e_bn

then the least common multiple of a and b is the obtained by taking for each prime factor appearing in at least one of the factorizations of a and b, taking it with the maximum exponent that it appears in the factorization of a or b. For example:

28 = 2^2 * 7
312 = 2^3 * 39

so that

lcm(28, 312) = 2^3 * 7 * 39 = 2184

All of this is to point out that naive approaches are admirable in their simplicity but you can spend all day optimizing every last nanosecond out of them and still not beat a superior algorithm.

放低过去 2024-08-27 20:30:33

我假设您想保留相同的算法。这至少应该是一个稍微更有效的实现。主要区别在于循环中的代码仅使用寄存器,而不使用内存。

int lcm(int a,int b)  {
    __asm {
        xor ecx, ecx
        mov esi, a
        mov edi, b
lstart:
        inc ecx
        mov eax, ecx
        xor edx, edx
        idiv esi
        test edx, edx
        jne lstart
        mov eax, ecx;
        idiv edi
        test edx, edx
        jnz lstart
        mov eax, ecx
        leave
        ret
    }
}

然而,正如 Jason 指出的那样,这确实不是一个非常有效的算法——乘法、求 GCD 和除法通常会更快(除非 ab相当小)。

编辑:还有另一种算法几乎更容易理解,它也应该快得多(比原来的算法——不是乘法,然后除以 GCD)。不要生成连续的数字,直到找到一个能整除 ab 的数字,而是生成一个的连续倍数(最好是较大的数字),直到找到一个能被另一个整除的数字:

int lcm2(int a, int b) { 
    __asm { 
        xor ecx, ecx
        mov esi, a
        mov edi, b
    lstart:
        add ecx, esi
        mov eax, ecx
        xor edx, edx
        idiv edi
        test edx, edx
        jnz lstart
        mov eax, ecx
        leave
        ret
    }
}

这仍然很容易理解,但应该比原来有相当大的改进。

I'm going to assume you want to keep the same algorithm. This should at least be a slightly more efficient implementation of it. The main difference is that the code in the loop only uses registers, not memory.

int lcm(int a,int b)  {
    __asm {
        xor ecx, ecx
        mov esi, a
        mov edi, b
lstart:
        inc ecx
        mov eax, ecx
        xor edx, edx
        idiv esi
        test edx, edx
        jne lstart
        mov eax, ecx;
        idiv edi
        test edx, edx
        jnz lstart
        mov eax, ecx
        leave
        ret
    }
}

As Jason pointed out, however, this really isn't a very efficient algorithm -- multiplying, finding the GCD, and dividing will normally be faster (unless a and b are quite small).

Edit: there is another algorithm that's almost simpler to understand, that should also be a lot faster (than the original -- not than multiplying, then dividing by GCD). Instead of generating consecutive numbers until you find one that divides both a and b, generate consecutive multiples of one (preferably the larger) until you find one that divides evenly by the other:

int lcm2(int a, int b) { 
    __asm { 
        xor ecx, ecx
        mov esi, a
        mov edi, b
    lstart:
        add ecx, esi
        mov eax, ecx
        xor edx, edx
        idiv edi
        test edx, edx
        jnz lstart
        mov eax, ecx
        leave
        ret
    }
}

This remains dead simple to understand, but should give a considerable improvement over the original.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文