使用 Cuda 测试多个阵列的组合
我用 php 编写了以下代码,并一直在阅读 Cuda 以利用我的旧 Geforce 8800 Ultra 的 GPU 处理能力。如何将此嵌套组合测试转换为 Cuda 并行处理代码(如果可能的话......)?二维数组的总组合:$a、$b、$c、$d、$e 迅速增加到数万亿......
foreach($a as $aVal){
foreach($b as $bVal){
foreach($c as $cVal){
foreach($d as $dVal){
foreach($e as $eVal){
$addSum = $aVal[0]+$bVal[0]+$cVal[0]+$dVal[0]+$eVal[0];
$capSum = $aVal[1]+$bVal[1]+$cVal[1]+$dVal[1]+$eVal[1];
if($capSum <= CAP_LIMIT){
$tempArr = array("a" => $aVal[2],"b" => $aVal[2],"c" => $aVal[2],
"d" => $aVal[2],"e" => $aVal[2],"addTotal" => $addSum,"capTotal" => $capSum);
array_push($topCombinations, $tempArr);
if(count($topCombinations) > 1000){
$topCombinations = $ca->arraySortedDescend($topCombinations);
array_splice($topCombinations, 900);
}
}
}
}
}
}
}
I have the below code written in php and have been reading up on Cuda to utilize the GPU processing power of my old Geforce 8800 Ultra. How do I convert this nested combinations test to Cuda parallel processing code (if even possible...)? The total combinations of the 2d arrays: $a, $b, $c, $d, $e quickly rise into the trillions...
foreach($a as $aVal){
foreach($b as $bVal){
foreach($c as $cVal){
foreach($d as $dVal){
foreach($e as $eVal){
$addSum = $aVal[0]+$bVal[0]+$cVal[0]+$dVal[0]+$eVal[0];
$capSum = $aVal[1]+$bVal[1]+$cVal[1]+$dVal[1]+$eVal[1];
if($capSum <= CAP_LIMIT){
$tempArr = array("a" => $aVal[2],"b" => $aVal[2],"c" => $aVal[2],
"d" => $aVal[2],"e" => $aVal[2],"addTotal" => $addSum,"capTotal" => $capSum);
array_push($topCombinations, $tempArr);
if(count($topCombinations) > 1000){
$topCombinations = $ca->arraySortedDescend($topCombinations);
array_splice($topCombinations, 900);
}
}
}
}
}
}
}
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
这是一个非常开放的问题。它需要语言之间的转换以及设计并行算法。我不会讨论太多细节,但简而言之:
如何并行化取决于数组的大小($a - $e)。如果它们足够大,您可以仅跨网格中的线程并行化外部一两个循环,并按顺序执行内部循环。如果它们不是很大,您可能需要展平 2-3 个外部循环,或者可能使用 CUDA 中的 2D 或 3D 线程块和网格来实现它们。
This is a very wide-open question. It requires conversion between languages as well as designing a parallel algorithm. I won't go into too much detail, but in a nutshell:
How you parallelize it depends on the size of your arrays ($a - $e). If they are large enough, you could parallelize only the outer one or two loops across threads in a grid, and do the inner loops sequentially. If they are not super large, you might want to either flatten 2-3 of the outer loops or possibly implement them using 2D or 3D thread blocks and grids in CUDA.