将整数施放到Opencl中的浮子

发布于 2025-01-28 01:54:30 字数 4533 浏览 1 评论 0原文

这是我在Stack Overflow上的第一篇文章，所以请忍受。

我目前正在编程OPENCL内核，并需要使用Innoluilt SQRT函数。但是，为此，该功能的参数必须是浮点。我目前有一个整数值，需要将其转换为float，以执行SQRT（）函数，然后将其转换回整数，以便将其存储到“ Magout”数组中。

下面的代码应该更好地了解我要做的事情：

magOutput[workItemNum] = sqrt(xConv[workItemNum]*xConv[workItemNum] + yConv[workItemNum]*yConv[workItemNum]);

如果需要了解所需的应用程序是完整的代码：

__kernel void matrixMultiplication(__global int* input, __global int* xConv, __global int* yConv, __global int* size, __global int* magOutput){

int workItemNum = get_global_id(0); //Work item ID
int workGroupNum = get_group_id(0); //Work group ID
int localGroupID = get_local_id(0); //Work items ID within each work group

// size refers to the total size of a matrix. So for a 3x3 size = 9
float dim = *size;
int dim1 = *size; 

int row = sqrt(dim); // only square matrices are used and as such the sqrt of size produces the row length
int current_row = workItemNum/dim; // the current row is calculated by using the current workitem number divided by the total size of the matrix

int col = sqrt(dim); // only square matrices are used and as such the sqrt of size produces the column length
int current_col = workItemNum % dim1; // the current column is calculated by using the current workitem number modulus by the total size of the matrix

// printf("dimension: %i \n",localGroupID);

// This if statement excludes all boundary pixels from the calculation as you require the neighbouring pixel cells 
// for this calculation
if (current_col == 0 || current_col == col-1 || current_row == 0 || current_row == row - 1){
    /*===============================================================================================================
    * The xConv array performs the kernal convultion of the input grey scale values with the following matrix:
    *
    *                            [-1  0 +1]
    * X - Directional Kernel  =  [-2  0 +2]
    *                            [-1  0 +1]
    * 
    * This scans across the X direction of the image and enhances all edges in the X-direction 
    * ===============================================================================================================
    */
    xConv[workItemNum] =  input[(current_col - 1)*col + current_row - 1]*-1 
             + input[(current_col)*col + current_row - 1]*0 
             + input[(current_col + 1)*col + current_row - 1]*1 
             + input[(current_col - 1)*col + current_row]*-2 
             + input[(current_col)*col + current_row]*0 
             + input[(current_col + 1)*col + current_row]*2 
             + input[(current_col - 1)*col + current_row + 1]*-1 
             + input[(current_col)*col + current_row + 1]*0 
             + input[(current_col + 1)*col + current_row + 1]*1;

    /*===============================================================================================================
    * The xConv array performs the kernal convultion of the input grey scale values with the following matrix:
    *
    *                            [+1 +2 +1]
    * Y - Directional Kernel  =  [ 0  0  0]
    *                            [-1 -2 -1]
    * 
    * This scans across the Y direction of the image and enhances all edges in the Y-direction 
    * ===============================================================================================================
    */
    yConv[workItemNum] = input[(current_col - 1)*col + current_row - 1]*-1 
             + input[(current_col)*col + current_row - 1]*-2 
             + input[(current_col + 1)*col + current_row - 1]*-1 
             + input[(current_col - 1)*col + current_row]*0 
             + input[(current_col)*col + current_row]*0 
             + input[(current_col + 1)*col + current_row]*0 
             + input[(current_col - 1)*col + current_row + 1]*1 
             + input[(current_col)*col + current_row + 1]*2 
             + input[(current_col + 1)*col + current_row + 1]*1;
}

//===============================================================================================================
// Calculates the convolution matrix of the X and Y arrays. Does so by squaring each item of the X and Y arrays,  
// adding them and taking the square root. This is the basic magnitude formula. This is done for by each workItem
//===============================================================================================================
magOutput[workItemNum] = sqrt(xConv[workItemNum]*xConv[workItemNum] + yConv[workItemNum]*yConv[workItemNum]);
}

是否有建议？

原文

This is my first post on stack overflow so bear with me.

I am currently programming an OpenCL Kernel and require the use of the inbuilt sqrt function. However, for this to work the parameter of the function needs to be a float. I currently have an integer value and need to convert it to float, in order to perform the sqrt() function and then convert it back to an integer so that it can be stored into the "magOut" array.

The Code below should provide a better understanding of what I am trying to do:

magOutput[workItemNum] = sqrt(xConv[workItemNum]*xConv[workItemNum] + yConv[workItemNum]*yConv[workItemNum]);

In case it is needed to understand the required application here is the full code:

__kernel void matrixMultiplication(__global int* input, __global int* xConv, __global int* yConv, __global int* size, __global int* magOutput){

int workItemNum = get_global_id(0); //Work item ID
int workGroupNum = get_group_id(0); //Work group ID
int localGroupID = get_local_id(0); //Work items ID within each work group

// size refers to the total size of a matrix. So for a 3x3 size = 9
float dim = *size;
int dim1 = *size; 

int row = sqrt(dim); // only square matrices are used and as such the sqrt of size produces the row length
int current_row = workItemNum/dim; // the current row is calculated by using the current workitem number divided by the total size of the matrix

int col = sqrt(dim); // only square matrices are used and as such the sqrt of size produces the column length
int current_col = workItemNum % dim1; // the current column is calculated by using the current workitem number modulus by the total size of the matrix

// printf("dimension: %i \n",localGroupID);

// This if statement excludes all boundary pixels from the calculation as you require the neighbouring pixel cells 
// for this calculation
if (current_col == 0 || current_col == col-1 || current_row == 0 || current_row == row - 1){
    /*===============================================================================================================
    * The xConv array performs the kernal convultion of the input grey scale values with the following matrix:
    *
    *                            [-1  0 +1]
    * X - Directional Kernel  =  [-2  0 +2]
    *                            [-1  0 +1]
    * 
    * This scans across the X direction of the image and enhances all edges in the X-direction 
    * ===============================================================================================================
    */
    xConv[workItemNum] =  input[(current_col - 1)*col + current_row - 1]*-1 
             + input[(current_col)*col + current_row - 1]*0 
             + input[(current_col + 1)*col + current_row - 1]*1 
             + input[(current_col - 1)*col + current_row]*-2 
             + input[(current_col)*col + current_row]*0 
             + input[(current_col + 1)*col + current_row]*2 
             + input[(current_col - 1)*col + current_row + 1]*-1 
             + input[(current_col)*col + current_row + 1]*0 
             + input[(current_col + 1)*col + current_row + 1]*1;

    /*===============================================================================================================
    * The xConv array performs the kernal convultion of the input grey scale values with the following matrix:
    *
    *                            [+1 +2 +1]
    * Y - Directional Kernel  =  [ 0  0  0]
    *                            [-1 -2 -1]
    * 
    * This scans across the Y direction of the image and enhances all edges in the Y-direction 
    * ===============================================================================================================
    */
    yConv[workItemNum] = input[(current_col - 1)*col + current_row - 1]*-1 
             + input[(current_col)*col + current_row - 1]*-2 
             + input[(current_col + 1)*col + current_row - 1]*-1 
             + input[(current_col - 1)*col + current_row]*0 
             + input[(current_col)*col + current_row]*0 
             + input[(current_col + 1)*col + current_row]*0 
             + input[(current_col - 1)*col + current_row + 1]*1 
             + input[(current_col)*col + current_row + 1]*2 
             + input[(current_col + 1)*col + current_row + 1]*1;
}

//===============================================================================================================
// Calculates the convolution matrix of the X and Y arrays. Does so by squaring each item of the X and Y arrays,  
// adding them and taking the square root. This is the basic magnitude formula. This is done for by each workItem
//===============================================================================================================
magOutput[workItemNum] = sqrt(xConv[workItemNum]*xConv[workItemNum] + yConv[workItemNum]*yConv[workItemNum]);
}

Any suggestions?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

眼波传意 2025-02-04 01:54:30

因此，您本质上拥有int数据类型的2D向量，并希望计算其长度。 OPENCL C的大部分仅是标准C99代码/语法，因此最直接的方法是使用标准的C风格类型铸造：

magOutput[workItemNum] = (int)(sqrt((float)xConv[workItemNum]*(float)xConv[workItemNum] + (float)yConv[workItemNum]*(float)yConv[workItemNum])+0.5f);

+0.5F用于正确的舍入：铸造float到int始终向下滚动，例如（int）3.9f将转换为3。通过在铸造之前立即添加+0.5F，将结果正确/向上舍入/下。请注意，我首先施放float，然后进行平方；否则，在乘法过程中可能会出现整数溢出。

一个可能更快的方法是：在这里，我加载值xconv [workitemnum]/yconv [workitemnum]从全局内存中仅一次（这真的很慢），请施放它们到float并将它们存储在私有内存（寄存器）xconvf/yconvf。然后，我进行长度计算和舍入，然后将结果写回magOutput [workitemnum]在缓慢的全局内存中。

const float xConvf = (float)xConv[workItemNum], yConvf = (float)yConv[workItemNum];
magOutput[workItemNum] = (int)(sqrt(xConvf*xConvf + yConvf*yConvf)+0.5f);

如果您想在OpenCl C中使用内置的数学功能非常喜欢，也可以执行此操作（应该与第二种方法一样快）：

magOutput[workItemNum] = (int)(length(float2((float)xConv[workItemNum], (float)yConv[workItemNum]))+0.5f);

因此，您有2个要点：

type in opencl中的casting opencl中的作品就像在C：（float）x中。
从 /写入到全局内存（您的内核参数阵列）确实很慢。仅在局部变量（这些是私人内存空间）中加载必要的值，以在寄存器中使用数字，然后在寄存器中进行算术，然后再回到全局内存中。

虽然这使您的OpenCl C代码未触及，但我建议使用此轻量级 opencl-wrapper 用于开发C ++ 。这将CPU代码中的OPENCL控制逻辑降低到约1/4，并使开发变得更加容易。

So you essentially have a 2D vector of int data type and want to calculate its length. Most of OpenCL C is just standard C99 code/syntax, so the most straightforward way would be to use standard C-style type casting:

magOutput[workItemNum] = (int)(sqrt((float)xConv[workItemNum]*(float)xConv[workItemNum] + (float)yConv[workItemNum]*(float)yConv[workItemNum])+0.5f);

The +0.5f is for correct rounding: casting a float to int always rounds down, for example (int)3.9f would be converted to 3. By adding the +0.5f immediately before casting, the result is rounded up/down correctly. Note that I first cast to float and then do the squaring; otherwise there could be integer overflow during the multiplication.

A possibly faster approach would be this: Here I load the values xConv[workItemNum]/yConv[workItemNum] from global memory only once (this is really slow), cast them to float and store them in private memory (registers) xConvf/yConvf. Then I do the length calculation and rounding and then I write the result back to magOutput[workItemNum] in slow global memory.

const float xConvf = (float)xConv[workItemNum], yConvf = (float)yConv[workItemNum];
magOutput[workItemNum] = (int)(sqrt(xConvf*xConvf + yConvf*yConvf)+0.5f);

If you want to get really fancy with the built-in math functionality in OpenCL C, you can also do this (should be exactly as fast as the 2nd approach):

magOutput[workItemNum] = (int)(length(float2((float)xConv[workItemNum], (float)yConv[workItemNum]))+0.5f);

So there is 2 takeaways for you:

Type casting in OpenCL works just like in C: (float)x.
Reading from / writing to global memory (your kernel parameter arrays) is really slow. Only load the necessary values once in local variables (these are private memory space) to have the numbers in registers, then do the arithmetic in registers and then write back to global memory once.

While this leaves your OpenCL C code untouched, I recommend this lightweight OpenCL-Wrapper for development with C++. This reduces the OpenCL control logic in your CPU code to about 1/4 and makes development much easier.

回复收藏 0 原文

~没有更多了~