Fibonacci pthreads c programming-languages

带线程的递归 Fib、分段错误？

发布于 10-18 10:56 字数 967 浏览 10 评论 0原文

你知道为什么它对于 0、1、2、3、4... 这样的值工作得很好，而对于像 >15 这样的值却出现段错误吗？＃包括＃包括＃包括

void *fib(void *fibToFind);

main(){
pthread_t mainthread;

long fibToFind = 15;
long finalFib;

pthread_create(&mainthread,NULL,fib,(void*) fibToFind);

pthread_join(mainthread,(void*)&finalFib);

printf("The number is: %d\n",finalFib);
}


void *fib(void *fibToFind){
long retval;

long newFibToFind = ((long)fibToFind);

long returnMinusOne;
long returnMinustwo;

pthread_t minusone;
pthread_t minustwo;

if(newFibToFind == 0 || newFibToFind == 1)
return newFibToFind;

else{
long newFibToFind1 = ((long)fibToFind) - 1;
long newFibToFind2 = ((long)fibToFind) - 2;

pthread_create(&minusone,NULL,fib,(void*) newFibToFind1);
pthread_create(&minustwo,NULL,fib,(void*) newFibToFind2);

pthread_join(minusone,(void*)&returnMinusOne);
pthread_join(minustwo,(void*)&returnMinustwo);

return returnMinusOne + returnMinustwo;

}

}

原文

Any ideas why it works fine for values like 0, 1, 2, 3, 4... and seg faults for values like >15?
#include
#include
#include

void *fib(void *fibToFind);

main(){
pthread_t mainthread;

long fibToFind = 15;
long finalFib;

pthread_create(&mainthread,NULL,fib,(void*) fibToFind);

pthread_join(mainthread,(void*)&finalFib);

printf("The number is: %d\n",finalFib);
}


void *fib(void *fibToFind){
long retval;

long newFibToFind = ((long)fibToFind);

long returnMinusOne;
long returnMinustwo;

pthread_t minusone;
pthread_t minustwo;

if(newFibToFind == 0 || newFibToFind == 1)
return newFibToFind;

else{
long newFibToFind1 = ((long)fibToFind) - 1;
long newFibToFind2 = ((long)fibToFind) - 2;

pthread_create(&minusone,NULL,fib,(void*) newFibToFind1);
pthread_create(&minustwo,NULL,fib,(void*) newFibToFind2);

pthread_join(minusone,(void*)&returnMinusOne);
pthread_join(minustwo,(void*)&returnMinustwo);

return returnMinusOne + returnMinustwo;

}

}

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

白日梦2024-10-25 10:56:16

内存不足（堆栈空间不足）或有效线程句柄不足？

您需要大量的线程，这需要大量的堆栈/上下文。
Windows（和Linux）有一个愚蠢的“大[连续]堆栈”想法。

来自 pthreads_create 的文档：
“在 Linux/x86-32 上，新线程的默认堆栈大小为 2 MB。”
如果您制造 10,000 个线程，则需要 20 GB RAM。
我构建了 OP 程序的一个版本，它被大约 3500 个 (p) 线程轰炸
在 Windows XP64 上。

请参阅此 SO 线程，了解有关为什么大堆栈是一个非常糟糕的主意的更多详细信息：
为什么堆栈溢出仍然是一个问题？

如果您放弃大堆栈，并实现具有堆分配的并行语言
用于激活记录
（我们的 PARLANSE 是
其中之一）问题就消失了。

这是我们在 PARLANSE 中编写的第一个（顺序）程序：

(define fibonacci_argument 45)

(define fibonacci
   (lambda(function natural natural )function 
   `Given n, computes nth fibonacci number'
      (ifthenelse (<= ? 1)
           ?
         (+ (fibonacci (-- ?))
              (fibonacci (- ? 2))
           )+
   )ifthenelse  
   )lambda
 )define

这是在 i7 上运行的执行程序：

 C:\DMS\Domains\PARLANSE\Tools\PerformanceTest>run fibonaccisequential
 Starting Sequential Fibonacci(45)...Runtime: 33.752067 seconds
 Result: 1134903170

这是第二个程序，它是并行的：

(define coarse_grain_threshold 30) ; technology constant: tune to amortize fork overhead across lots of work

(define parallel_fibonacci
   (lambda (function natural natural )function 
   `Given n, computes nth fibonacci number'
      (ifthenelse (<= ? coarse_grain_threshold)
           (fibonacci ?)
           (let (;; [n natural ] [m natural ]  )
                   (value (|| (= m (parallel_fibonacci (-- ?)) )=
                              (= n (parallel_fibonacci (- ? 2)) )=
                          )||
                          (+ m n)
                   )value
           )let
       )ifthenelse  
   )lambda
)define

明确并行性也使程序更容易编写。

我们通过调用 (parallel_fibonacci 45) 测试并行版本。这里
是在同一个 i7 上运行的执行（可以说它有 8 个处理器，
但它实际上是 4 个超线程处理器，所以实际上不完全是 8 个
同等 CPU）：

C:\DMS\Domains\PARLANSE\Tools\PerformanceTest>run fibonacciparallelcoarse
Parallel Coarse-grain Fibonacci(45) with cutoff 30...Runtime: 5.511126 seconds
Result: 1134903170

加速接近 6+，对于非 8 个处理器来说还不错。另一个
这个问题的答案是 pthreads 版本；花了“几秒钟”
（爆炸）计算 Fib(18)，Fib(45) 需要 5.5 秒。
这告诉你pthreads
从根本上来说，这是一种执行大量细粒度并行性的糟糕方法，因为
它的分叉开销真的非常高。（PARLANSE 的设计目的是
最小化分叉开销）。

如果您将技术常量设置为零（每个调用时分叉，会发生以下情况）
to fib)：

C:\DMS\Domains\PARLANSE\Tools\PerformanceTest>run fibonacciparallel
Starting Parallel Fibonacci(45)...Runtime: 15.578779 seconds
Result: 1134903170

您可以看到，即使您的分叉速度很快，分摊分叉开销也是一个好主意。

Fib(45) 产生很多谷物。堆分配
激活记录解决了OP的一阶问题（每个pthreads数千个）
1Mb 的堆栈会消耗 GB 的 RAM）。

但还有一个二阶问题：2^45 PARLANSE“颗粒”也会烧掉你所有的内存
即使您的谷物控制块很小，也只需跟踪谷物即可。
因此，当你有“很多”时，有一个调度程序来限制分叉是有帮助的
（对于“很多”的某些定义，显着小于 2^45）颗粒以防止
“颗粒”跟踪数据结构淹没了机器，导致并行性爆炸。
当颗粒数量低于阈值时，它必须取消分叉
同样，要确保物理上始终有大量逻辑上并行的工作
CPU 来做。

Runs out of memory (out of space for stacks), or valid thread handles?

You're asking for an awful lot of threads, which require lots of stack/context.
Windows (and Linux) have a stupid "big [contiguous] stack" idea.

From the documentation on pthreads_create:
"On Linux/x86-32, the default stack size for a new thread is 2 megabytes."
If you manufacture 10,000 threads, you need 20 Gb of RAM.
I built a version of OP's program, and it bombed with some 3500 (p)threads
on Windows XP64.

See this SO thread for more details on why big stacks are a really bad idea:
Why are stack overflows still a problem?

If you give up on big stacks, and implement a parallel language with heap allocation
for activation records
(our PARLANSE is
one of these) the problem goes away.

Here's the first (sequential) program we wrote in PARLANSE:

(define fibonacci_argument 45)

(define fibonacci
   (lambda(function natural natural )function 
   `Given n, computes nth fibonacci number'
      (ifthenelse (<= ? 1)
           ?
         (+ (fibonacci (-- ?))
              (fibonacci (- ? 2))
           )+
   )ifthenelse  
   )lambda
 )define

Here's an execution run on an i7:

 C:\DMS\Domains\PARLANSE\Tools\PerformanceTest>run fibonaccisequential
 Starting Sequential Fibonacci(45)...Runtime: 33.752067 seconds
 Result: 1134903170

Here's the second, which is parallel:

(define coarse_grain_threshold 30) ; technology constant: tune to amortize fork overhead across lots of work

(define parallel_fibonacci
   (lambda (function natural natural )function 
   `Given n, computes nth fibonacci number'
      (ifthenelse (<= ? coarse_grain_threshold)
           (fibonacci ?)
           (let (;; [n natural ] [m natural ]  )
                   (value (|| (= m (parallel_fibonacci (-- ?)) )=
                              (= n (parallel_fibonacci (- ? 2)) )=
                          )||
                          (+ m n)
                   )value
           )let
       )ifthenelse  
   )lambda
)define

Making the parallelism explicit makes the programs a lot easier to write, too.

The parallel version we test by calling (parallel_fibonacci 45). Here
is the execution run on the same i7 (which arguably has 8 processors,
but it is really 4 processors hyperthreaded so it really isn't quite 8
equivalent CPUs):

C:\DMS\Domains\PARLANSE\Tools\PerformanceTest>run fibonacciparallelcoarse
Parallel Coarse-grain Fibonacci(45) with cutoff 30...Runtime: 5.511126 seconds
Result: 1134903170

A speedup near 6+, not bad for not-quite-8 processors. One of the other
answers to this question ran the pthreads version; it took "a few seconds"
(to blow up) computing Fib(18), and this is 5.5 seconds for Fib(45).
This tells you pthreads
is a fundamentally bad way to do lots of fine grain parallelism, because
it has really, really high forking overhead. (PARLANSE is designed to
minimize that forking overhead).

Here's what happens if you set the technology constant to zero (forks on every call
to fib):

C:\DMS\Domains\PARLANSE\Tools\PerformanceTest>run fibonacciparallel
Starting Parallel Fibonacci(45)...Runtime: 15.578779 seconds
Result: 1134903170

You can see that amortizing fork overhead is a good idea, even if you have fast forks.

Fib(45) produces a lot of grains. Heap allocation
of activation records solves the OP's first-order problem (thousands of pthreads each
with 1Mb of stack burns gigabytes of RAM).

But there's a second order problem: 2^45 PARLANSE "grains" will burn all your memory too
just keeping track of the grains even if your grain control block is tiny.
So it helps to have a scheduler that throttles forks once you have "a lot"
(for some definition of "a lot" significantly less that 2^45) grains to prevent the
explosion of parallelism from swamping the machine with "grain" tracking data structures.
It has to unthrottle forks when the number of grains falls below a threshold
too, to make sure there is always lots of logical, parallel work for the physical
CPUs to do.

回复收藏 0 原文

梦言归人2024-10-25 10:56:16

您没有检查错误 - 特别是来自 pthread_create() 的错误。当 pthread_create() 失败时，pthread_t 变量未定义，后续的 pthread_join() 可能会崩溃。

如果您确实检查错误，您会发现 pthread_create() 失败。这是因为您尝试生成近 2000 个线程 - 使用默认设置，这将需要单独分配 16GB 的线程堆栈。

您应该修改您的算法，使其不会生成这么多线程。

回复收藏 0 原文

仙气飘飘2024-10-25 10:56:16

我尝试运行你的代码，并遇到了一些惊喜：

printf("The number is: %d\n", finalFib);

这一行有一个小错误：%d意味着printf需要一个int，但是传递一个long int。在大多数平台上，这是相同的，或者无论如何都会有相同的行为，但迂腐地说（或者如果您只是想阻止警告出现，这也是一个非常崇高的理想），您应该使用 %ld 相反，它需要一个 long int。

另一方面，您的 fib 功能似乎不起作用。在我的机器上测试它，它不会崩溃，但它会产生 1047，这不是斐波那契数。仔细观察，您的程序似乎在几个方面都不正确：

void *fib(void *fibToFind)
{
    long retval; // retval is never used

    long newFibToFind = ((long)fibToFind);

    long returnMinusOne; // variable is read but never initialized
    long returnMinustwo; // variable is read but never initialized

    pthread_t minusone; // variable is never used (?)
    pthread_t minustwo; // variable is never used

    if(newFibToFind == 0 || newFibToFind == 1)
        // you miss a cast here (but you really shouldn't do it this way)
        return newFibToFind;        
    else{
        long newFibToFind1 = ((long)fibToFind) - 1; // variable is never used
        long newFibToFind2 = ((long)fibToFind) - 2; // variable is never used
        // reading undefined variables (and missing a cast)
        return returnMinusOne + returnMinustwo;

    }
}

始终注意编译器警告：当您收到警告时，通常，您确实正在做一些可疑的事情。

也许您应该稍微修改一下算法：现在，您的函数所做的只是返回两个未定义值的总和，因此我之前得到的是 1047。

使用递归算法实现斐波那契数列意味着您需要再次调用该函数。正如其他人指出的那样，这是一种效率很低的方法，但很简单，所以我想所有计算机科学老师都用它作为例子。

常规递归算法如下所示：

int fibonacci(int iteration)
{
    if (iteration == 0 || iteration == 1)
        return 1;

    return fibonacci(iteration - 1) + fibonacci(iteration - 2);
}

我不知道您应该在多大程度上使用线程 - 只需在辅助线程上运行算法，或者为每次调用创建新线程？现在我们假设第一个，因为它更简单。

将整数转换为指针，反之亦然是一种不好的做法，因为如果您尝试从更高的层次来看待事物，它们应该有很大的不同。整数进行数学运算，指针解析内存地址。它恰好有效，因为它们的表示方式相同，但实际上，您不应该这样做。相反，您可能会注意到，为运行新线程而调用的函数接受 void* 参数：我们可以使用它来传达输入的位置和输出将在哪里。

因此，在我之前的斐波那契函数的基础上，您可以使用此代码作为线程主例程：

void* fibonacci_offshored(void* pointer)
{
    int* pointer_to_number = pointer;
    int input = *pointer_to_number;
    *pointer_to_number = fibonacci(input);
    return NULL;
}

它需要一个指向整数的指针，并从中获取其输入，然后将其输出写入其中。¹ 然后你可以像这样创建线程：

int main()
{
    int value = 15;
    pthread_t thread;

    // on input, value should contain the number of iterations;
    // after the end of the function, it will contain the result of
    //  the fibonacci function
    int result = pthread_create(&thread, NULL, fibonacci_offshored, &value);

    // error checking is important! try to crash gracefully at the very least
    if (result != 0)
    {
        perror("pthread_create");
        return 1;
    }

    if (pthread_join(thread, NULL)
    {
        perror("pthread_join");
        return 1;
    }

    // now, value contains the output of the fibonacci function
    // (note that value is an int, so just %d is fine)
    printf("The value is %d\n", value);
    return 0;
}

如果你需要从新的不同线程调用斐波那契函数（请注意：这不是我的建议，其他人似乎同意我的观点；它只会爆炸达到足够多的迭代次数），您首先需要将 fibonacci 函数与 fibonacci_offshored 函数合并。它会大大增加它的体积，因为处理线程比处理常规函数更重。

void* threaded_fibonacci(void* pointer)
{
    int* pointer_to_number = pointer;
    int input = *pointer_to_number;

    if (input == 0 || input == 1)
    {
        *pointer_to_number = 1;
        return NULL;
    }

    // we need one argument per thread
    int minus_one_number = input - 1;
    int minus_two_number = input - 2;

    pthread_t minus_one;
    pthread_t minus_two;
    // don't forget to check! especially that in a recursive function where the
    // recursion set actually grows instead of shrinking, you're bound to fail
    // at some point
    if (pthread_create(&minus_one, NULL, threaded_fibonacci, &minus_one_number) != 0)
    {
        perror("pthread_create");
        *pointer_to_number = 0;
        return NULL;
    }
    if (pthread_create(&minus_two, NULL, threaded_fibonacci, &minus_two_number) != 0)
    {
        perror("pthread_create");
        *pointer_to_number = 0;
        return NULL;
    }

    if (pthread_join(minus_one, NULL) != 0)
    {
        perror("pthread_join");
        *pointer_to_number = 0;
        return NULL;
    }

    if (pthread_join(minus_two, NULL) != 0)
    {
        perror("pthread_join");
        *pointer_to_number = 0;
        return NULL;
    }

    *pointer_to_number = minus_one_number + minus_two_number;
    return NULL;
}

现在您已经有了这个庞大的函数，对 main 函数的调整将非常容易：只需将对 fibonacci_offshored 的引用更改为 threaded_fibonacci 即可。

int main()
{
    int value = 15;
    pthread_t thread;

    int result = pthread_create(&thread, NULL, threaded_fibonacci, &value);

    if (result != 0)
    {
        perror("pthread_create");
        return 1;
    }
    pthread_join(thread, NULL);

    printf("The value is %d\n", value);
    return 0;
}

您可能被告知线程可以加速并行进程，但有一个限制，即设置线程比运行其内容更昂贵。 这是这种情况的一个很好的例子：程序的线程版本运行速度比非线程版本慢得多。

出于教育目的，当所需迭代次数为 18 时，该程序在我的机器上耗尽了线程，并且运行需要几秒钟。相比之下，使用迭代实现，我们永远不会耗尽线程，并且我们可以在几毫秒内得到答案。它也简单得多。这将是一个很好的例子，说明如何使用更好的算法解决许多问题。

另外，出于好奇，看看它是否在您的计算机上崩溃以及在哪里/如何崩溃会很有趣。

<子>1。通常，您应该尽量避免在输入值和函数返回后的值之间更改变量的含义。例如，这里，在输入时，变量是我们想要的迭代次数；在输出上，它是函数的结果。这是两种截然不同的含义，这并不是一个好的做法。我不想使用动态分配通过 void* 返回值返回值。

I tried to run your code, and came across several surprises:

printf("The number is: %d\n", finalFib);

This line has a small error: %d means printf expects an int, but is passed a long int. On most platforms this is the same, or will have the same behavior anyways, but pedantically speaking (or if you just want to stop the warning from coming up, which is a very noble ideal too), you should use %ld instead, which will expect a long int.

Your fib function, on the other hand, seems non-functional. Testing it on my machine, it doesn't crash, but it yields 1047, which is not a Fibonacci number. Looking closer, it seems your program is incorrect on several aspects:

void *fib(void *fibToFind)
{
    long retval; // retval is never used

    long newFibToFind = ((long)fibToFind);

    long returnMinusOne; // variable is read but never initialized
    long returnMinustwo; // variable is read but never initialized

    pthread_t minusone; // variable is never used (?)
    pthread_t minustwo; // variable is never used

    if(newFibToFind == 0 || newFibToFind == 1)
        // you miss a cast here (but you really shouldn't do it this way)
        return newFibToFind;        
    else{
        long newFibToFind1 = ((long)fibToFind) - 1; // variable is never used
        long newFibToFind2 = ((long)fibToFind) - 2; // variable is never used
        // reading undefined variables (and missing a cast)
        return returnMinusOne + returnMinustwo;

    }
}

Always take care of compiler warnings: when you get one, usually, you really are doing something fishy.

Maybe you should revise the algorithm a little: right now, all your function does is returning the sum of two undefined values, hence the 1047 I got earlier.

Implementing the Fibonacci suite using a recursive algorithm means you need to call the function again. As others noted, it's quite an inefficient way of doing it, but it's easy, so I guess all computer science teachers use it as an example.

The regular recursive algorithm looks like this:

int fibonacci(int iteration)
{
    if (iteration == 0 || iteration == 1)
        return 1;

    return fibonacci(iteration - 1) + fibonacci(iteration - 2);
}

I don't know to which extent you were supposed to use threads—just run the algorithm on a secondary thread, or create new threads for each call? Let's assume the first for now, since it's a lot more straightforward.

Casting integers to pointers and vice-versa is a bad practice because if you try to look at things at a higher level, they should be widely different. Integers do maths, and pointers resolve memory addresses. It happens to work because they're represented the same way, but really, you shouldn't do this. Instead, you might notice that the function called to run your new thread accepts a void* argument: we can use it to convey both where the input is, and where the output will be.

So building upon my previous fibonacci function, you could use this code as the thread main routine:

void* fibonacci_offshored(void* pointer)
{
    int* pointer_to_number = pointer;
    int input = *pointer_to_number;
    *pointer_to_number = fibonacci(input);
    return NULL;
}

It expects a pointer to an integer, and takes from it its input, then writes it output there.¹ You would then create the thread like that:

int main()
{
    int value = 15;
    pthread_t thread;

    // on input, value should contain the number of iterations;
    // after the end of the function, it will contain the result of
    //  the fibonacci function
    int result = pthread_create(&thread, NULL, fibonacci_offshored, &value);

    // error checking is important! try to crash gracefully at the very least
    if (result != 0)
    {
        perror("pthread_create");
        return 1;
    }

    if (pthread_join(thread, NULL)
    {
        perror("pthread_join");
        return 1;
    }

    // now, value contains the output of the fibonacci function
    // (note that value is an int, so just %d is fine)
    printf("The value is %d\n", value);
    return 0;
}

If you need to call the Fibonacci function from new distinct threads (please note: that's not what I'd advise, and others seem to agree with me; it will just blow up for a sufficiently large amount of iterations), you'll first need to merge the fibonacci function with the fibonacci_offshored function. It will considerably bulk it up, because dealing with threads is heavier than dealing with regular functions.

void* threaded_fibonacci(void* pointer)
{
    int* pointer_to_number = pointer;
    int input = *pointer_to_number;

    if (input == 0 || input == 1)
    {
        *pointer_to_number = 1;
        return NULL;
    }

    // we need one argument per thread
    int minus_one_number = input - 1;
    int minus_two_number = input - 2;

    pthread_t minus_one;
    pthread_t minus_two;
    // don't forget to check! especially that in a recursive function where the
    // recursion set actually grows instead of shrinking, you're bound to fail
    // at some point
    if (pthread_create(&minus_one, NULL, threaded_fibonacci, &minus_one_number) != 0)
    {
        perror("pthread_create");
        *pointer_to_number = 0;
        return NULL;
    }
    if (pthread_create(&minus_two, NULL, threaded_fibonacci, &minus_two_number) != 0)
    {
        perror("pthread_create");
        *pointer_to_number = 0;
        return NULL;
    }

    if (pthread_join(minus_one, NULL) != 0)
    {
        perror("pthread_join");
        *pointer_to_number = 0;
        return NULL;
    }

    if (pthread_join(minus_two, NULL) != 0)
    {
        perror("pthread_join");
        *pointer_to_number = 0;
        return NULL;
    }

    *pointer_to_number = minus_one_number + minus_two_number;
    return NULL;
}

Now that you have this bulky function, adjustments to your main function are going to be quite easy: just change the reference to fibonacci_offshored to threaded_fibonacci.

int main()
{
    int value = 15;
    pthread_t thread;

    int result = pthread_create(&thread, NULL, threaded_fibonacci, &value);

    if (result != 0)
    {
        perror("pthread_create");
        return 1;
    }
    pthread_join(thread, NULL);

    printf("The value is %d\n", value);
    return 0;
}

You might have been told that threads speed up parallel processes, but there's a limit somewhere where it's more expensive to set up the thread than run its contents. This is a very good example of such a situation: the threaded version of the program runs much, much slower than the non-threaded one.

For educational purposes, this program runs out of threads on my machine when the number of desired iterations is 18, and takes a few seconds to run. By comparison, using an iterative implementation, we never run out of threads, and we have our answer in a matter of milliseconds. It's also considerably simpler. This would be a great example of how using a better algorithm fixes many problems.

Also, out of curiosity, it would be interesting to see if it crashes on your machine, and where/how.

_{1. Usually, you should try to avoid to change the meaning of a variable between its value on input and its value after the return of the function. For instance, here, on input, the variable is the number of iterations we want; on output, it's the result of the function. Those are two very different meanings, and that's not really a good practice. I didn't feel like using dynamic allocations to return a value through the void* return value.}

回复收藏 0 原文

~没有更多了~