浮点大小、矩阵乘法、OpenCL、套接字。诡异的

发布于 2024-10-07 20:00:33 字数 1370 浏览 0 评论 0原文

我使用以下函数生成两个矩阵(注意省略了一些代码):

srand(2007);
randomInit(h_A_data, size_A);

void randomInit(float* data, int size)
{
 int i;
 for (i = 0; i < size; ++i){
 data[i] = rand() / (float)RAND_MAX;
 }
}

这为矩阵 A 和 B 调用。这用 0.something 值填充矩阵,例如 0.748667。然后我使用 CPU 执行矩阵乘法。我将结果与通过 OpenCL 的 GPU 实现进行比较。所得矩阵的值范围为 20.something,例如 23.472757。 CPU 和 GPU 都给出相同的结果。 CPU 实现取自 nvidia 的 Cuda 工具包 distrib:

void computeGold(float* C, const float* A, const float* B, unsigned int hA, unsigned int wA, unsigned int wB)
{
unsigned int i;
unsigned int j;
unsigned int k;
for (i = 0; i < hA; ++i)
    for (j = 0; j < wB; ++j) {
        double sum = 0;
        for (k = 0; k < wA; ++k) {
            double a = A[i * wA + k];
            double b = B[k * wB + j];
            sum += a * b;
        }
        C[i * wB + j] = (float)sum;
    }

}

奇怪的是,内存中的所有三个矩阵都具有相同的大小,即 sizeof(float)*size_A 或矩阵 B 的 *size_B 等。当我转储它们时到磁盘时,存储在矩阵 C(相乘矩阵)中的结果文件比矩阵 A 和 B 更大。

更重要的是,对于我的应用程序,我通过套接字通过网络传输这些结果。就原始字节数而言,所有矩阵都是相同的,但通过网络传输矩阵 C 需要更长的时间。该问题是针对大矩阵大小进行推断的。这是为什么呢?

更新/编辑:

fprintf(matrix_c_file,"\n\nMatrix C\n");
   for(i = 0; i < size_C; i++)
   {
      fprintf(matrix_c_file,"%f ", h_C_data[i]);
   }
fprintf(matrix_c_file,"\n");

当矩阵 A 和 B 仅包含零时,所有三个(矩阵 A、B 和 C)在磁盘上的大小相同。

I'm generating two matrices using the following function (note some code is omitted):

srand(2007);
randomInit(h_A_data, size_A);

void randomInit(float* data, int size)
{
 int i;
 for (i = 0; i < size; ++i){
 data[i] = rand() / (float)RAND_MAX;
 }
}

This is called for matrix A and B. This populates the matrices with 0.something values, e.g. 0.748667. I then perform a matrix multiplication using a CPU. I compare the result to a GPU implementation via OpenCL. The resulting matrix has values in the range 20.something, e.g. 23.472757. Both the CPU and the GPU give the same result. The CPU implementation is taken from the Cuda toolkit distrib by nvidia:

void computeGold(float* C, const float* A, const float* B, unsigned int hA, unsigned int wA, unsigned int wB)
{
unsigned int i;
unsigned int j;
unsigned int k;
for (i = 0; i < hA; ++i)
    for (j = 0; j < wB; ++j) {
        double sum = 0;
        for (k = 0; k < wA; ++k) {
            double a = A[i * wA + k];
            double b = B[k * wB + j];
            sum += a * b;
        }
        C[i * wB + j] = (float)sum;
    }

}

The weird thing is, all three matrices in memory are of the same size, i.e. sizeof(float)*size_A, or *size_B for matrix B etc. When I dump them to the disk, the file for the result stored in matrix C (the multiplied matrix) is bigger than matrix A and B.

Even more critical, for my application I'm transferring these over a network via a socket. In terms of the raw number of bytes, all matrices are the same, and yet it takes longer to transfer matrix C over the network. The problem is extrapolated for large matrix sizes. Why is this?

UPDATE/EDIT:

fprintf(matrix_c_file,"\n\nMatrix C\n");
   for(i = 0; i < size_C; i++)
   {
      fprintf(matrix_c_file,"%f ", h_C_data[i]);
   }
fprintf(matrix_c_file,"\n");

When matrix A and B contain only zero's, all three (matrix A, B and C) are the same size on disk.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

吃兔兔 2024-10-14 20:00:33

我认为 lijie 在评论中给出了正确的(尽管简洁)答案。 %f 格式说明符可能会产生宽度可变的字符串。考虑以下 C 代码:

    printf("%f\n", 0.0);
    printf("%f\n", 3.1415926535897932384626433);
    printf("%f\n", 20.53);
    printf("%f\n", 20.5e38);

它会产生:

0.000000
3.141593
20.530000
2050000000000000019963732141023730597888.000000

所有输出的小数点后位数相同(默认为 6),但小数点左侧的数字可变。如果您需要矩阵的文本表示形式具有一致的大小并且不介意牺牲一些精度,则可以使用 %e 格式说明符来强制使用指数表示形式,例如 2.345 e12

I think that lijie has the correct (albeit terse) answer in the comments. The %f format specifier can result in a string with variable width. Consider the following C code:

    printf("%f\n", 0.0);
    printf("%f\n", 3.1415926535897932384626433);
    printf("%f\n", 20.53);
    printf("%f\n", 20.5e38);

which produces:

0.000000
3.141593
20.530000
2050000000000000019963732141023730597888.000000

All of the output has the same number of digits after the decimal point (6 by default), but a variable number to the left of the decimal point. If you need the textual representation of your matrix to be a consistent size and you don't mind sacrificing some precision, you can use the %e format specifier instead to force an exponential representation like 2.345e12.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文