lambda函数中的通过转文，多线程C＆＃x2B;＆＃x2B;

发布于 2025-01-24 19:35:02 字数 4084 浏览 4 评论 0原文

下面的代码应该用不同数量的线程测试SIN和COS功能的运行时。我正在为一个项目写这篇文章，其中运行时非常相关，这是一项可行性研究，是否会减少运行时。

这个想法是通过不同的sample_size和num_threads，看看它如何影响运行时。

问题：输出不是我所期望的。

void-unction cos_sin_multiplication中的ID始终会增加一个。因此，我得到（ID：1 ... ID：num_threads+1）而不是（ID：0 ... ID：num_threads）。
当我使用2/3/4线程运行代码时，会得到一个分段故障。
当我使用7个或多个线程运行时，几个ID将更改为num_threads。
cos_out [0]的输出始终是0，

这里是num_threads = 8和sample_size = 100'000的示例输出。

Initiate Thread: 0 with 12500 datapoints.
Initiate Thread: 1 with 12500 datapoints.
Initiate Thread: 2 with 12500 datapoints.
Initiate Thread: 3 with 12500 datapoints.
Initiate Thread: 4 with 12500 datapoints.
Initiate Thread: 5 with 12500 datapoints.
Initiate Thread: 6 with 12500 datapoints.
Initiate Thread: 7 with 12500 datapoints.
ID: 4: sin: 0.861292 cos: -1.72477
ID: 8: sin: -56.1798 cos: 55.4332
ID: 8: sin: -68.1969 cos: 51.9351
ID: 3: sin: 0.861292 cos: -1.72477
ID: 2: sin: 0.861292 cos: -1.72477
ID: 1: sin: 0.861292 cos: -1.72477
ID: 8: sin: -61.1793 cos: 58.8878
ID: 8: sin: -64.8086 cos: 59.5946
The execution took: 0.004465 seconds. 
ID: 0: sin: 59.5946 cos: 0
ID: 1: sin: 0.861292 cos: -1.72477
ID: 2: sin: 0.861292 cos: -1.72477
ID: 3: sin: 0.861292 cos: -1.72477
ID: 4: sin: 0.861292 cos: -1.72477
ID: 5: sin: 0 cos: 0
ID: 6: sin: 0 cos: 0
ID: 7: sin: 0 cos: 0

谁能向我指向正确的方向？

//Multithreaded Cosnius and Sinus Calculations Benchmark
// Calculate a sample of Cosinus and Sinus with different numbers of Threads
// to determine the runtime gain for different number of threads 

#include <math.h>

#include <iostream>
#include <fstream>
#include <thread>
#include <mutex>
#include <chrono>
#include <vector>

#define NUM_THREADS 3
#define SAMPLE_SIZE 2000000
#define PI 3.1415

float diff_time;

std::ofstream calc_speed;
std::mutex out_guard;

void cos_sin_multiplication(int id, int sample, float theta, float& value, float& sin_out, float& cos_out){
    for (int j = 0; j < sample; j++){
        sin_out += sin(PI*theta);
        cos_out += cos(PI*theta);
        theta += 0.1;
    }
    out_guard.lock();
    std::cout << "ID: " << id << ": sin: " << sin_out << " cos: " << cos_out << "\n";
    out_guard.unlock();
}

int main(){
    auto start_time = std::chrono::system_clock::now();

    std::vector<std::thread> Threads;

    int64_t sample_per_thread;
    int mod_sample_per_thread = SAMPLE_SIZE%NUM_THREADS;
    float value[SAMPLE_SIZE];

    float theta = 0.0;
    float cos_out[NUM_THREADS];
    float sin_out[NUM_THREADS];
    

    for(int i = 0; i <  NUM_THREADS; i++){
        cos_out[i] = 0.0;
        sin_out[i] = 0.0;
    }

    for(int i = 0; i < NUM_THREADS; i++){   
        
        if (i < mod_sample_per_thread){
            sample_per_thread = SAMPLE_SIZE/NUM_THREADS + 1;
        }
        else{
            sample_per_thread = SAMPLE_SIZE/NUM_THREADS;
        }
        out_guard.lock();
        std::cout << "Initiate Thread: " << i <<" with "<< sample_per_thread << " datapoints." << "\n";
        out_guard.unlock();

        Threads.emplace_back([&](){cos_sin_multiplication(i, sample_per_thread, theta, value[0], sin_out[i], cos_out[i]);});
    }

    for(auto& t: Threads){
        t.join();
    }

    auto end_time = std::chrono::system_clock::now();
    std::chrono::duration<double> diff_time = end_time - start_time;
    out_guard.lock();
    std::cout << "The execution took: " << diff_time.count() << " seconds. \n";
    out_guard.unlock();

    for(int i = 0; i < NUM_THREADS; i++){
        out_guard.lock();
        std::cout << "ID: " << i << ": sin: " << sin_out[i] << " cos: " << cos_out[i] << "\n";
        out_guard.unlock();
    }
    return 0;
}

解决方案： 用[＆amp; i = i，sample_per_thread = sample_per_thread]替换[＆amp;]] s st仅通过引用传递需要通过引用传递的东西。

原文

the code below is supposed to test the runtime of sin and cos function with different numbers of threads. I am writing this for a project where runtime is very relevant and it is a feasibility study whether multithreading will decrease the runtime enough.

The idea is to pass it a different SAMPLE_SIZE and NUM_THREADS and see how it affects runtime.

Problem: The output is not what I expected it to be.

The ID inside the void-function cos_sin_multiplication is always incremented by one. So I get (ID:1 ... ID:NUM_THREADS+1) instead of (ID:0 ... ID:NUM_THREADS).
When I run the code with 2/3/4 Threads I get a Segmentation Fault.
When I run with 7 or more threads several IDs are changed to NUM_THREADS.
The output of cos_out[0] is always 0

Here an example output for NUM_THREADS = 8 and SAMPLE_SIZE = 100'000.

Initiate Thread: 0 with 12500 datapoints.
Initiate Thread: 1 with 12500 datapoints.
Initiate Thread: 2 with 12500 datapoints.
Initiate Thread: 3 with 12500 datapoints.
Initiate Thread: 4 with 12500 datapoints.
Initiate Thread: 5 with 12500 datapoints.
Initiate Thread: 6 with 12500 datapoints.
Initiate Thread: 7 with 12500 datapoints.
ID: 4: sin: 0.861292 cos: -1.72477
ID: 8: sin: -56.1798 cos: 55.4332
ID: 8: sin: -68.1969 cos: 51.9351
ID: 3: sin: 0.861292 cos: -1.72477
ID: 2: sin: 0.861292 cos: -1.72477
ID: 1: sin: 0.861292 cos: -1.72477
ID: 8: sin: -61.1793 cos: 58.8878
ID: 8: sin: -64.8086 cos: 59.5946
The execution took: 0.004465 seconds. 
ID: 0: sin: 59.5946 cos: 0
ID: 1: sin: 0.861292 cos: -1.72477
ID: 2: sin: 0.861292 cos: -1.72477
ID: 3: sin: 0.861292 cos: -1.72477
ID: 4: sin: 0.861292 cos: -1.72477
ID: 5: sin: 0 cos: 0
ID: 6: sin: 0 cos: 0
ID: 7: sin: 0 cos: 0

Can anyone point me in the right direction?

//Multithreaded Cosnius and Sinus Calculations Benchmark
// Calculate a sample of Cosinus and Sinus with different numbers of Threads
// to determine the runtime gain for different number of threads 

#include <math.h>

#include <iostream>
#include <fstream>
#include <thread>
#include <mutex>
#include <chrono>
#include <vector>

#define NUM_THREADS 3
#define SAMPLE_SIZE 2000000
#define PI 3.1415

float diff_time;

std::ofstream calc_speed;
std::mutex out_guard;

void cos_sin_multiplication(int id, int sample, float theta, float& value, float& sin_out, float& cos_out){
    for (int j = 0; j < sample; j++){
        sin_out += sin(PI*theta);
        cos_out += cos(PI*theta);
        theta += 0.1;
    }
    out_guard.lock();
    std::cout << "ID: " << id << ": sin: " << sin_out << " cos: " << cos_out << "\n";
    out_guard.unlock();
}

int main(){
    auto start_time = std::chrono::system_clock::now();

    std::vector<std::thread> Threads;

    int64_t sample_per_thread;
    int mod_sample_per_thread = SAMPLE_SIZE%NUM_THREADS;
    float value[SAMPLE_SIZE];

    float theta = 0.0;
    float cos_out[NUM_THREADS];
    float sin_out[NUM_THREADS];
    

    for(int i = 0; i <  NUM_THREADS; i++){
        cos_out[i] = 0.0;
        sin_out[i] = 0.0;
    }

    for(int i = 0; i < NUM_THREADS; i++){   
        
        if (i < mod_sample_per_thread){
            sample_per_thread = SAMPLE_SIZE/NUM_THREADS + 1;
        }
        else{
            sample_per_thread = SAMPLE_SIZE/NUM_THREADS;
        }
        out_guard.lock();
        std::cout << "Initiate Thread: " << i <<" with "<< sample_per_thread << " datapoints." << "\n";
        out_guard.unlock();

        Threads.emplace_back([&](){cos_sin_multiplication(i, sample_per_thread, theta, value[0], sin_out[i], cos_out[i]);});
    }

    for(auto& t: Threads){
        t.join();
    }

    auto end_time = std::chrono::system_clock::now();
    std::chrono::duration<double> diff_time = end_time - start_time;
    out_guard.lock();
    std::cout << "The execution took: " << diff_time.count() << " seconds. \n";
    out_guard.unlock();

    for(int i = 0; i < NUM_THREADS; i++){
        out_guard.lock();
        std::cout << "ID: " << i << ": sin: " << sin_out[i] << " cos: " << cos_out[i] << "\n";
        out_guard.unlock();
    }
    return 0;
}

Solution:
Replace [&] with [&, i=i, sample_per_thread=sample_per_thread] s.t. only things that need to be passed by reference are passed by reference.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

才能让你更想念 2025-01-31 19:35:02

Threads.emplace_back([&](){cos_sin_multiplication(i, sample_per_thread, theta, value[0], sin_out[i], cos_out[i]);});
    }

C ++无法确切保证执行线程实际上开始执行此封闭时间。您唯一可以依靠的是，这将在新的std :: thread对象被构造（作为emplace的一部分）之后发生。为了使它正常工作，必须发生的事情远不远。如果执行线程开始执行闭合，并在之前，唯一的工作正常工作的情况是，执行线程开始执行闭合，并评估所有参数 。代码> 循环，之后立即。这不是很好的机会。

因此，除了所有问题sample_per_thread之外，还将是为其计算的最后一个值。

在循环完成后，所有执行此封闭的所有执行线程都可能最终最终最终执行此关闭，并评估所有参数，这些参数是通过参考捕获的， for 。已被摧毁，使一切不确定。

即使某些执行线程设法醒来并早点闻到咖啡的味道，您仍然无法保证，sample_per_per_thread也是它在之前为其计算的。 std :: thread构造对象。实际上，这几乎可以保证，至少某些执行线程将在已经计算出下一个执行线程的表面消耗后计算出sample_per_thread的sample_per_thread的捕获值。

换句话说，这里什么都没有正确起作用，因为一切都会通过参考捕获。

Threads.emplace_back([&](){cos_sin_multiplication(i, sample_per_thread, theta, value[0], sin_out[i], cos_out[i]);});
    }

C++ gives you no guarantees, whatsoever, exactly when the execution thread will actually start executing this closure. The only thing you can rely on is that this will happen at some point after the new std::thread object gets constructed (as part of the emplace). Which is nowhere near what must happen in order for this to work correctly. The only situation where everything works correctly would be if the execution thread begins executing the closure, and evaluates all of the parameters to the function call before the parent execution thread iterates the for loop, immediately afterwards. The chances of that are not very good.

So, in addition to everything else that goes wrong sample_per_thread will be whatever was the last value calculated for it, as well.

It is entirely possible that all of your execution threads will finally end up executing this closure, and evaluating all of the parameters, which were captured by reference, after the for loop has finished, and i has been destroyed, making everything undefined behavior.

Even if some of the execution threads managed to wake up and smell the coffee a little bit earlier, you still have no guarantees, whatsoever, that sample_per_thread would be what was calculated for it just before its std::thread object was constructed. This is, actually, pretty much a guarantee that at least some of the execution threads will obtain the captured-by-reference value of sample_per_thread after it was already calculated for the next execution thread's ostensible consumption.

In other words, nothing here works correctly because everything gets captured by reference.

回复收藏 0 原文

~没有更多了~