DPC的怪异行为在FPGA设备上运行它后代码

发布于 2025-01-30 23:02:15 字数 2581 浏览 6 评论 0原文

我正在使用DPC ++在FPGA设备上加速KNN算法。以下代码是我为欧几里得距离编写的代码。问题在于,在FPGA硬件(Intel Arria 10 Oneapi)上运行时,FPGA_EMULATION效果很好,没有问题给出了结果缓冲区中的所有值,这意味着在Parallel_for lioop中出现了错误。但是我找不到任何错误,仿真奏效了。

我正在使用Intel DevCloud平台。

std::vector<double> distance_calculation_FPGA(queue& q, const std::vector<std::vector<double>>& dataset, const std::vector<double>& curr_test) {
    std::cout<<"convert 2D to 1D"<<std::endl;
    std::vector<double>linear_dataset;
    for (int i = 0; i < dataset.size(); ++i) {
        for (int j = 0; j < dataset[i].size(); ++j) {
            linear_dataset.push_back(dataset[i][j]);
        }
    }
    std::cout<<"buffering"<<std::endl;
      range<1> num_items{dataset.size()};
    std::vector<double>res;
    //std::cout << "im in" << std::endl;

    res.resize(dataset.size());
    buffer dataset_buf(linear_dataset);
    buffer curr_test_buf(curr_test);
    buffer res_buf(res.data(), num_items);
    
    std::cout<<"submit a job"<<std::endl;
    auto start = std::chrono::high_resolution_clock::now();
    {
    q.submit([&](handler& h) {
        accessor a(dataset_buf, h, read_only);
        accessor b(curr_test_buf, h, read_only);

        accessor dif(res_buf, h, write_only, no_init);
        h.parallel_for(num_items, [=](auto i) {
                for (int j = 0; j < 5; ++j) {
                    dif[i] += (b[j] - a[i * 5 + j]) * (b[j] - a[i * 5 + j]);  
                }
           // out << "i : " << i << " i[0]: " << i[0] << " b: " << b[0] << cl::sycl::endl;
            });
        }).wait();
    }
    auto finish = std::chrono::high_resolution_clock::now();
    std::chrono::duration<double> elapsed = finish - start;
    std::cout << "Elapsed time: " << elapsed.count() << " s\n";
    /* Iterative distance calculation
        for (int i = 0; i < dataset.size(); ++i) {
            double dis = 0;
            for (int j = 0; j < dataset[i].size(); ++j) {
                dis += (curr_test[j] - dataset[i][j]) * (curr_test[j] - dataset[i][j]);
            }
            res.push_back(dis);
        }
        */
    return res;
}

带有fpga_emulation的结果:./knn.fpga_emu

fpga硬件的结果:./knn.fpga

I am using DPC++ to accelerate knn algorithm on FPGA device. The following code is the code I wrote for the euclidean distance. The problem is that the fpga_emulation works very well with no problems while running it on fpga hardware (Intel Arria 10 OneAPI) gives -nan for all values in the resulting buffer, which means something got wrong in the parallel_for lioop. But I can't find anything wrong about it and the emulation worked.

I am using Intel Devcloud platform.

std::vector<double> distance_calculation_FPGA(queue& q, const std::vector<std::vector<double>>& dataset, const std::vector<double>& curr_test) {
    std::cout<<"convert 2D to 1D"<<std::endl;
    std::vector<double>linear_dataset;
    for (int i = 0; i < dataset.size(); ++i) {
        for (int j = 0; j < dataset[i].size(); ++j) {
            linear_dataset.push_back(dataset[i][j]);
        }
    }
    std::cout<<"buffering"<<std::endl;
      range<1> num_items{dataset.size()};
    std::vector<double>res;
    //std::cout << "im in" << std::endl;

    res.resize(dataset.size());
    buffer dataset_buf(linear_dataset);
    buffer curr_test_buf(curr_test);
    buffer res_buf(res.data(), num_items);
    
    std::cout<<"submit a job"<<std::endl;
    auto start = std::chrono::high_resolution_clock::now();
    {
    q.submit([&](handler& h) {
        accessor a(dataset_buf, h, read_only);
        accessor b(curr_test_buf, h, read_only);

        accessor dif(res_buf, h, write_only, no_init);
        h.parallel_for(num_items, [=](auto i) {
                for (int j = 0; j < 5; ++j) {
                    dif[i] += (b[j] - a[i * 5 + j]) * (b[j] - a[i * 5 + j]);  
                }
           // out << "i : " << i << " i[0]: " << i[0] << " b: " << b[0] << cl::sycl::endl;
            });
        }).wait();
    }
    auto finish = std::chrono::high_resolution_clock::now();
    std::chrono::duration<double> elapsed = finish - start;
    std::cout << "Elapsed time: " << elapsed.count() << " s\n";
    /* Iterative distance calculation
        for (int i = 0; i < dataset.size(); ++i) {
            double dis = 0;
            for (int j = 0; j < dataset[i].size(); ++j) {
                dis += (curr_test[j] - dataset[i][j]) * (curr_test[j] - dataset[i][j]);
            }
            res.push_back(dis);
        }
        */
    return res;
}

results with fpga_emulation: ./knn.fpga_emu

results for fpga hardware: ./knn.fpga

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

面犯桃花 2025-02-06 23:02:15

关于您的使用情况的问题,通常与NAN之类的东西,显然我们正在研究非初始化的内存(或除以0,而您没有)。范围是否有可能在FGPA和/或值不正确初始化数组发病的值?

抱歉,我知道这很基本,但是没有您的数据集,我不能100%确定我可以复制它。

Question on your usage, usually with something like a NaN obviously we are looking at uninitialized memory (or divide by 0 which you don't have). Is it possible the ranges are some how off on the FGPA and/or the values aren't properly initialized for the array incidies?

Sorry I know that's pretty basic, but without your dataset I'm not 100% sure I can reproduce it.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文