OpenCL内核中本机函数的非确定性行为

发布于 2025-02-05 10:37:17 字数 1458 浏览 2 评论 0原文

tl; dr：使用本机_log2（）在使用log2（）时会产生非确定性行为，从而产生确定性行为。为什么会发生这种情况？

因此，我在下面具有此功能，充当OpenCL内核的辅助功能，并且我使用的是Log2（ anding> andation_log2 ）的本机_版本来提高速度性能。

当我比较内核和原始程序产生的结果时，我意识到，在大多数情况下，内核正在产生正确的值，但是有时它会产生不正确的值（，500k中的30个不正确的值功能调用）。 非常重要：错误并不总是在相同的计算上。我正在处理多个输入文件，并且错误似乎在不同运行的不同文件集中随机发生。也就是说，结果是非确定性的。

经过一些测试后，我将问题范围缩小到下面的函数，并发现通过log2交换本机_log2会产生正确值的100％times 。所有这些类型看起来都很丑陋，但是Log2（）和Floor（）函数仅与Double/Float兼容，而我的输入/输出必须是整数。

我的设备是NVIDIA GPU 940MX，仅支持OpenCL 1.2。 opencl 1.2文档

用本机_前缀定义的表6.8功能子集。这些功能可以映射到一个或多个本机设备指令，并且与表6.8中描述的相应功能（没有本机__前缀）相比，通常具有更好的性能。这些函数的输入范围（s）的准确性（在某些情况下）是实现定义的。

显然，我应该在使用本机函数时期望一些错误，但是文档尚不清楚我可能遇到的错误的确定论。

有人可以说明我为什么要面对这种奇怪的行为吗？

int xGetExpGolombNumberOfBits(int value){
    unsigned int uiLength2 = 1;
    unsigned int uiTemp2 = select((unsigned int)( value << 1 ), ( (unsigned int)( -value ) << 1 ) + 1, value <= 0);
    
    // These magic numbers (7 and 128) are substituting two constants for the sake of clarity
    while( uiTemp2 > 128 ) 
    {
      uiLength2 += ( 7 << 1 );
      uiTemp2  >>=   7;
    }

    return uiLength2 + (((int)floor(native_log2((float)uiTemp2))) << 1);
}

原文

TL;DR: Using native_log2() produces non deterministic behavior in OpenCL kernel, while using log2() produces deterministic behavior. Why is this happening?

So I have this function below acting as a helper function for an OpenCL kernel, and I was using the native_ version of log2 (native_log2) to improve speed performance.

When I was comparing the results produced by the kernel and by the original program, I realized that in most of the cases the kernel is producing the right values, however, sometimes it produces an incorrect value (like 30 incorrect values in 500k function calls). VERY IMPORTANT: The errors are not always on the same computations. I am processing multiple input files, and the errors seem to occur randomly in different sets of files with different runs. That is, the results are non deterministic.

After some tests I narrowed the problem to the function below and found out that swapping the native_log2 by log2 produces the correct value 100% of the times. All those typecasts look ugly, but the log2() and floor() functions are only compatible with double/float, while my input/output must be integers.

My device is a NVIDIA GPU 940MX and only supports OpenCL 1.2. The OpenCL 1.2 documentation states that

A subset of functions from table 6.8 that are defined with the native_ prefix. These functions may map to one or more native device instructions and will typically have better performance compared to the corresponding functions (without the native__ prefix) described in table 6.8. The accuracy (and in some cases the input range(s)) of these functions is implementation-defined.

Clearly I am supposed to expect some errors when using native_ functions, but the documentation is not clear about the determinism of the errors I may be encountering.

Can someone give me directions on why I am facing this strange behavior?

int xGetExpGolombNumberOfBits(int value){
    unsigned int uiLength2 = 1;
    unsigned int uiTemp2 = select((unsigned int)( value << 1 ), ( (unsigned int)( -value ) << 1 ) + 1, value <= 0);
    
    // These magic numbers (7 and 128) are substituting two constants for the sake of clarity
    while( uiTemp2 > 128 ) 
    {
      uiLength2 += ( 7 << 1 );
      uiTemp2  >>=   7;
    }

    return uiLength2 + (((int)floor(native_log2((float)uiTemp2))) << 1);
}

分享到QQ

分享到微博