在 C# 中计算素数的最快方法?

发布于 2024-07-04 00:48:15 字数 1048 浏览 10 评论 0原文

我实际上有我的问题的答案,但它没有并行化,所以我对改进算法的方法感兴趣。 无论如何,它对某些人来说可能很有用。

int Until = 20000000;
BitArray PrimeBits = new BitArray(Until, true);

/*
 * Sieve of Eratosthenes
 * PrimeBits is a simple BitArray where all bit is an integer
 * and we mark composite numbers as false
 */

PrimeBits.Set(0, false); // You don't actually need this, just
PrimeBits.Set(1, false); // remindig you that 2 is the smallest prime

for (int P = 2; P < (int)Math.Sqrt(Until) + 1; P++)
    if (PrimeBits.Get(P))
        // These are going to be the multiples of P if it is a prime
        for (int PMultiply = P * 2; PMultiply < Until; PMultiply += P)
            PrimeBits.Set(PMultiply, false);

// We use this to store the actual prime numbers
List<int> Primes = new List<int>();

for (int i = 2; i < Until; i++)
    if (PrimeBits.Get(i))
        Primes.Add(i);

也许我可以使用多个 BitArray 和 BitArray.And() 将它们放在一起?

I actually have an answer to my question but it is not parallelized so I am interested in ways to improve the algorithm. Anyway it might be useful as-is for some people.

int Until = 20000000;
BitArray PrimeBits = new BitArray(Until, true);

/*
 * Sieve of Eratosthenes
 * PrimeBits is a simple BitArray where all bit is an integer
 * and we mark composite numbers as false
 */

PrimeBits.Set(0, false); // You don't actually need this, just
PrimeBits.Set(1, false); // remindig you that 2 is the smallest prime

for (int P = 2; P < (int)Math.Sqrt(Until) + 1; P++)
    if (PrimeBits.Get(P))
        // These are going to be the multiples of P if it is a prime
        for (int PMultiply = P * 2; PMultiply < Until; PMultiply += P)
            PrimeBits.Set(PMultiply, false);

// We use this to store the actual prime numbers
List<int> Primes = new List<int>();

for (int i = 2; i < Until; i++)
    if (PrimeBits.Get(i))
        Primes.Add(i);

Maybe I could use multiple BitArrays and BitArray.And() them together?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(8

情话难免假 2024-07-11 00:48:15
    void PrimeNumber(long number)
    {
        bool IsprimeNumber = true;
        long  value = Convert.ToInt32(Math.Sqrt(number));
        if (number % 2 == 0)
        {
            IsprimeNumber = false;
            MessageBox.Show("No It is not a Prime NUmber");
            return;
        }
        for (long i = 3; i <= value; i=i+2)
        {             
           if (number % i == 0)
            {

                MessageBox.Show("It is divisible by" + i);
                IsprimeNumber = false;
                break;
            }

        }
        if (IsprimeNumber)
        {
            MessageBox.Show("Yes Prime NUmber");
        }
        else
        {
            MessageBox.Show("No It is not a Prime NUmber");
        }
    }
    void PrimeNumber(long number)
    {
        bool IsprimeNumber = true;
        long  value = Convert.ToInt32(Math.Sqrt(number));
        if (number % 2 == 0)
        {
            IsprimeNumber = false;
            MessageBox.Show("No It is not a Prime NUmber");
            return;
        }
        for (long i = 3; i <= value; i=i+2)
        {             
           if (number % i == 0)
            {

                MessageBox.Show("It is divisible by" + i);
                IsprimeNumber = false;
                break;
            }

        }
        if (IsprimeNumber)
        {
            MessageBox.Show("Yes Prime NUmber");
        }
        else
        {
            MessageBox.Show("No It is not a Prime NUmber");
        }
    }
狼性发作 2024-07-11 00:48:15

有一篇关于埃拉托色尼筛子的非常好的文章: The Genuine Sieve of Eratosthenes Eratosthenes

它是在函数设置中,但大多数优化也适用于 C# 中的过程实现。

两个最重要的优化是从 P^2 而不是 2*P 开始划掉,并使用轮子来计算下一个素数。

对于并发性,您可以与 P 并行处理直到 P^2 的所有数字,而无需做任何不必要的工作。

There's a very good article about the Sieve of Eratosthenes: The Genuine Sieve of Eratosthenes

It's in a functional setting, but most of the opimization do also apply to a procedural implementation in C#.

The two most important optimizations are to start crossing out at P^2 instead of 2*P and to use a wheel for the next prime numbers.

For concurrency, you can process all numbers till P^2 in parallel to P without doing any unnecessary work.

素衣风尘叹 2024-07-11 00:48:15

您还应该考虑对算法进行可能的更改。

考虑一下,当您找到元素时,将它们简单地添加到列表中可能会更便宜。

也许为您的列表预先分配空间,将使构建/填充的成本更便宜。

You also should consider a possible change of algorithms.

Consider that it may be cheaper to simply add the elements to your list, as you find them.

Perhaps preallocating space for your list, will make it cheaper to build/populate.

执笔绘流年 2024-07-11 00:48:15

你想寻找新的素数吗? 这可能听起来很愚蠢,但您也许能够加载某种具有已知素数的数据结构。 我确信有人有一份清单。 找到现有数字来计算新数字可能是一个更容易的问题。

您还可以查看 Microsoft 的 Parallel FX Library,使您现有的代码成为多线程以利用的多核系统。 通过最少的代码更改,您可以使 for 循环成为多线程。

Are you trying to find new primes? This may sound stupid, but you might be able to load up some sort of a data structure with known primes. I am sure someone out there has a list. It might be a much easier problem to find existing numbers that calculate new ones.

You might also look at Microsofts Parallel FX Library for making your existing code multi-threaded to take advantage of multi-core systems. With minimal code changes you can make you for loops multi-threaded.

岛歌少女 2024-07-11 00:48:15

@DrPizza 分析仅真正有助于改进实现,它不会揭示并行执行的机会,或建议更好的算法(除非您有其他方面的经验,在这种情况下我真的很想看看您的分析器)。

我家里只有单核机器,但运行了相当于 BitArray 筛子的 Java,以及筛子反转的单线程版本 - 将标记素数保存在数组中,并使用 wheel 将搜索空间减少五倍,然后使用每个标记素数以轮的增量标记一个位数组。 它还将存储减少到 O(sqrt(N)) 而不是 O(N),这在最大 N、分页和带宽方面都有帮助。

对于 N 的中等值(1e8 到 1e12),可以很快找到 sqrt(N) 以内的素数,之后您应该能够非常轻松地在 CPU 上并行执行后续搜索。 在我的单核机器上,轮子方法在 28 秒内找到 1e9 以内的素数,而你的筛子(将 sqrt 移出循环后)需要 86 秒 - 改进是由于轮子; 反转意味着您可以处理大于 2^32 的 N,但速度会变慢。 代码可以在此处找到。 在经过 sqrt(N) 之后,您也可以并行化朴素筛结果的输出,因为在该点之后位数组不会被修改; 但是一旦你处理的 N 足够大,数组大小对于整数来说就太大了。

@DrPizza Profiling only really helps improve an implementation, it doesn't reveal opportunities for parallel execution, or suggest better algorithms (unless you've experience to the otherwise, in which case I'd really like to see your profiler).

I've only single core machines at home, but ran a Java equivalent of your BitArray sieve, and a single threaded version of the inversion of the sieve - holding the marking primes in an array, and using a wheel to reduce the search space by a factor of five, then marking a bit array in increments of the wheel using each marking prime. It also reduces storage to O(sqrt(N)) instead of O(N), which helps both in terms of the largest N, paging, and bandwidth.

For medium values of N (1e8 to 1e12), the primes up to sqrt(N) can be found quite quickly, and after that you should be able to parallelise the subsequent search on the CPU quite easily. On my single core machine, the wheel approach finds primes up to 1e9 in 28s, whereas your sieve (after moving the sqrt out of the loop) takes 86s - the improvement is due to the wheel; the inversion means you can handle N larger than 2^32 but makes it slower. Code can be found here. You could parallelise the output of the results from the naive sieve after you go past sqrt(N) too, as the bit array is not modified after that point; but once you are dealing with N large enough for it to matter the array size is too big for ints.

2024-07-11 00:48:15

如果没有分析,我们就无法判断程序的哪一部分需要优化。

如果您在一个大型系统中,那么人们会使用分析器来发现素数生成器是需要优化的部分。

对包含十几条指令的循环进行分析通常不值得 - 与循环体相比,分析器的开销非常大,并且改进如此小的循环的唯一方法是更改​​算法以进行更少的迭代。 因此,IME,一旦您消除了任何昂贵的函数并且有了几行简单代码的已知目标,您最好更改算法并安排端到端运行,而不是尝试通过指令级别改进代码分析。

Without profiling we cannot tell which bit of the program needs optimizing.

If you were in a large system, then one would use a profiler to find that the prime number generator is the part that needs optimizing.

Profiling a loop with a dozen or so instructions in it is not usually worth while - the overhead of the profiler is significant compared to the loop body, and about the only ways to improve a loop that small is to change the algorithm to do fewer iterations. So IME, once you've eliminated any expensive functions and have a known target of a few lines of simple code, you're better off changing the algorithm and timing an end-to-end run than trying to improve the code by instruction level profiling.

请爱~陌生人 2024-07-11 00:48:15

除了并行化之外,您不想在每次迭代时计算 sqrt(Until) 。 您还可以假设 2、3 和 5 的倍数,并且仅计算 {1,5} 中的 N%6 或 {1,7,11,13,17,19,23,29} 中的 N%30。

您应该能够非常轻松地并行化因式分解算法,因为第 N 阶段仅取决于第 sqrt(n) 个结果,因此一段时间后不会出现任何冲突。 但这不是一个好的算法,因为它需要大量除法。

如果您有保证在读取之前完成的编写器工作包,您还应该能够并行化筛选算法。 大多数情况下,编写者不应该与读者发生冲突 - 至少一旦你完成了一些条目,他们应该在读者之上至少 N 之上工作,因此你只需要相当偶尔的同步读取(当 N 超过最后一个同步读取时)价值)。 您不需要在任意数量的写入器线程之间同步 bool 数组,因为不会出现写入冲突(最坏的情况是,多个线程会将 true 写入同一位置)。

主要问题是确保任何正在等待写入的工作人员都已完成。 在 C++ 中,您可以使用比较并设置来切换到在任何时候正在等待的工作线程。 我不是 C# 专家,所以不知道如何使用该语言,但 Win32 InterlockedCompareExchange 函数应该可用。

您还可以尝试基于参与者的方法,因为这样您可以安排使用最低值的参与者,这可能更容易保证您正在读取筛子的有效部分,而不必在每个增量上锁定总线N.

无论哪种方式,您都必须确保所有工作人员在阅读之前都已获得上述条目 N,而这样做的成本就是在并行和串行之间进行权衡的地方。

Parallelisation aside, you don't want to be calculating sqrt(Until) on every iteration. You also can assume multiples of 2, 3 and 5 and only calculate for N%6 in {1,5} or N%30 in {1,7,11,13,17,19,23,29}.

You should be able to parallelize the factoring algorithm quite easily, since the Nth stage only depends on the sqrt(n)th result, so after a while there won't be any conflicts. But that's not a good algorithm, since it requires lots of division.

You should also be able to parallelize the sieve algorithms, if you have writer work packets which are guaranteed to complete before a read. Mostly the writers shouldn't conflict with the reader - at least once you've done a few entries, they should be working at least N above the reader, so you only need a synchronized read fairly occasionally (when N exceeds the last synchronized read value). You shouldn't need to synchronize the bool array across any number of writer threads, since write conflicts don't arise (at worst, more than one thread will write a true to the same place).

The main issue would be to ensure that any worker being waited on to write has completed. In C++ you'd use a compare-and-set to switch to the worker which is being waited for at any point. I'm not a C# wonk so don't know how to do it that language, but the Win32 InterlockedCompareExchange function should be available.

You also might try an actor based approach, since that way you can schedule the actors working with the lowest values, which may be easier to guarantee that you're reading valid parts of the the sieve without having to lock the bus on each increment of N.

Either way, you have to ensure that all workers have got above entry N before you read it, and the cost of doing that is where the trade-off between parallel and serial is made.

尴尬癌患者 2024-07-11 00:48:15

通过使用双向链表交叉引用位数组,您可以节省一些时间,这样您就可以更快地前进到下一个素数。

另外,在消除后面的复合时,一旦您第一次遇到新的质数 p - 剩余的第一个复合倍数将是 p*p,因为之前的所有内容都已被消除。 事实上,您只需将 p 乘以列表中它后面剩余的所有潜在素数,一旦您的乘积超出范围(大于 Until),就停止。

还有一些很好的概率算法,例如 Miller-Rabin 测试。 维基百科页面是一个很好的介绍。

You might save some time by cross-referencing your bit array with a doubly-linked list, so you can more quickly advance to the next prime.

Also, in eliminating later composites once you hit a new prime p for the first time - the first composite multiple of p remaining will be p*p, since everything before that has already been eliminated. In fact, you only need to multiply p by all the remaining potential primes that are left after it in the list, stopping as soon as your product is out of range (larger than Until).

There are also some good probabilistic algorithms out there, such as the Miller-Rabin test. The wikipedia page is a good introduction.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文