tensorflow 输出的 tensor 数据格式是怎么定义的？

发布于 2022-09-12 22:39:18 字数 1751 浏览 34 评论 0

各位好，我是刚刚开始接触机器学习，打算做一个行人检测的应用，也就是检测图片内中的行人并标注出来。我在 Github 上找到一份代码，编译调试后，确认可以进行，所以想改改他的代码来适应我的项目，经过长时间的调试后，最后出现了一个不能解决的问题，就是在获取输出的 tensor 后，作者取数据计算行人边框的方式很难理解。

主要问题有下面几个：

tensor 里面不是存的 NCHW 或者 NHWC 的数据么？为什么会有位置信息？
为什么输出的 tensor 里 boxes->host<float>()[i * 4] 和中心点有关， boxes->host<float>()[i * 4 + 2] 和宽度有关，这个定义是如何定义的，是训练的时候定义 model 时定义的么？
如何确定一个 tensor 里面的数据组成呢？

下面是作者获取边框的函数：

void UltraFace::generateBBox(std::vector<FaceInfo> &bbox_collection, MNN::Tensor *scores, MNN::Tensor *boxes) {
    for (int i = 0; i < num_anchors; i++) {
        if (scores->host<float>()[i * 2 + 1] > score_threshold) {
            FaceInfo rects;
            float x_center = boxes->host<float>()[i * 4] * center_variance * priors[i][2] + priors[i][0];
            float y_center = boxes->host<float>()[i * 4 + 1] * center_variance * priors[i][3] + priors[i][1];
            float w = exp(boxes->host<float>()[i * 4 + 2] * size_variance) * priors[i][2];
            float h = exp(boxes->host<float>()[i * 4 + 3] * size_variance) * priors[i][3];
            
            rects.x1 = clip(x_center - w / 2.0, 1) * image_w;
            rects.y1 = clip(y_center - h / 2.0, 1) * image_h;
            rects.x2 = clip(x_center + w / 2.0, 1) * image_w;
            rects.y2 = clip(y_center + h / 2.0, 1) * image_h;
            rects.score = clip(scores->host<float>()[i * 2 + 1], 1);
            bbox_collection.push_back(rects);
        }
    }
}

刚刚接触，所以无从下手，希望好心人能帮个忙，给个搜索的关键词，上面的问题主要是不知道怎么搜才能搜索相关信息 ~~~

分享到QQ

分享到微博