关于 C++ 中的 parallel_accumulate 的混淆;并发实践

发布于 2024-08-04 20:09:47 字数 2057 浏览 2 评论 0原文

在下面的示例(第 2 章)中,Anthony Williams 尝试并行化标准累加函数。我的问题是他为什么这样做:

   unsigned long const max_threads=(length+min_per_thread-1)/min_per_thread; 

为什么加长度并减 1? 为什么不只是:

 unsigned long const max_threads=length/min_per_thread; 

................................................ ................................................

   template<typename Iterator,typename T>
    struct accumulate_block
    {
           void operator()(Iterator first,Iterator last,T& result)
           {
              result=std::accumulate(first,last,result);
           }
    };


    template<typename Iterator,typename T>
    T parallel_accumulate(Iterator first,Iterator last,T init)
    {
          unsigned long const length=std::distance(first,last);
          if(!length) 
              return init;

          unsigned long const min_per_thread=25;
          unsigned long const max_threads=(length+min_per_thread-1)/min_per_thread; 

          unsigned long const hardware_threads=std::thread::hardware_concurrency();
          unsigned long const num_threads=
          std::min(hardware_threads!=0?hardware_threads:2,max_threads);

         unsigned long const block_size=length/num_threads;   

         std::vector<T> results(num_threads);
         std::vector<std::thread> threads(num_threads-1); 
         Iterator block_start=first;
         for(unsigned long i=0;i<(num_threads-1);++i)
         {
             Iterator block_end=block_start;
             std::advance(block_end,block_size); #6
             threads[i]=std::thread( accumulate_block<Iterator,T>(),     
                              block_start,block_end,std::ref(results[i]));
             block_start=block_end; 
         }
        accumulate_block()(block_start,last,results[num_threads-1]);  
        std::for_each(threads.begin(),threads.end(),
        std::mem_fn(&std::thread::join));

        return std::accumulate(results.begin(),results.end(),init); 
    }

In the following example (Chapter 2), Anthony Williams is trying to parallelize the standard accumulate function. my question is why is he doing this:

   unsigned long const max_threads=(length+min_per_thread-1)/min_per_thread; 

why add length and subtract 1?
why not just:

 unsigned long const max_threads=length/min_per_thread; 

...................................................................................

   template<typename Iterator,typename T>
    struct accumulate_block
    {
           void operator()(Iterator first,Iterator last,T& result)
           {
              result=std::accumulate(first,last,result);
           }
    };


    template<typename Iterator,typename T>
    T parallel_accumulate(Iterator first,Iterator last,T init)
    {
          unsigned long const length=std::distance(first,last);
          if(!length) 
              return init;

          unsigned long const min_per_thread=25;
          unsigned long const max_threads=(length+min_per_thread-1)/min_per_thread; 

          unsigned long const hardware_threads=std::thread::hardware_concurrency();
          unsigned long const num_threads=
          std::min(hardware_threads!=0?hardware_threads:2,max_threads);

         unsigned long const block_size=length/num_threads;   

         std::vector<T> results(num_threads);
         std::vector<std::thread> threads(num_threads-1); 
         Iterator block_start=first;
         for(unsigned long i=0;i<(num_threads-1);++i)
         {
             Iterator block_end=block_start;
             std::advance(block_end,block_size); #6
             threads[i]=std::thread( accumulate_block<Iterator,T>(),     
                              block_start,block_end,std::ref(results[i]));
             block_start=block_end; 
         }
        accumulate_block()(block_start,last,results[num_threads-1]);  
        std::for_each(threads.begin(),threads.end(),
        std::mem_fn(&std::thread::join));

        return std::accumulate(results.begin(),results.end(),init); 
    }

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

贱人配狗天长地久 2024-08-11 20:09:47

使用问题

 unsigned long const max_threads=length/min_per_thread;

是由整数除法期间使用的截断舍入引起的,

如果

length = 7
min_per_thread = 5

那么

max_threads = length / min_per_thread = 1

最大线程数实际上应该是 2

length + min_per_thread - 1 = 11

max_threads = (length + min_per_thread - 1) / min_per_thread = 2

The problem with using

 unsigned long const max_threads=length/min_per_thread;

is caused by the truncation rounding used during integer division

if

length = 7
min_per_thread = 5

then

max_threads = length / min_per_thread = 1

while max threads should actually be 2

length + min_per_thread - 1 = 11

max_threads = (length + min_per_thread - 1) / min_per_thread = 2
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文