如何优化通过霓虹灯内在的图像添加填充的功能?
我是霓虹灯的新手,尽管我可以进行一些处理,但我在某些基本概念上缺乏知识而挣扎,尤其是在优化2D阵列方面。
uint8_t** add_padding(uint8_t** img,int width, int height) {
uint8_t** padded_image = (uint8_t**)calloc((height + 2), sizeof(uint8_t*));
for (int i = 0; i < height + 2; i++) {
if (padded_image) {
padded_image[i] = (uint8_t*)calloc((width + 2), sizeof(uint8_t));
}
}
for (int i = 1; i < height +1 ; i++) {
for (int j = 1; j < width + 1; j++) {
padded_image[i][j] = img[i - 1][j - 1];
}
}
return padded_image;
}
如何在C中使用霓虹灯内在的函数对函数进行矢量化?
im new to NEON and whilst i can do some processing i struggle with lack of knowledge at some basics concepts especially with optimizing 2d arrays.
uint8_t** add_padding(uint8_t** img,int width, int height) {
uint8_t** padded_image = (uint8_t**)calloc((height + 2), sizeof(uint8_t*));
for (int i = 0; i < height + 2; i++) {
if (padded_image) {
padded_image[i] = (uint8_t*)calloc((width + 2), sizeof(uint8_t));
}
}
for (int i = 1; i < height +1 ; i++) {
for (int j = 1; j < width + 1; j++) {
padded_image[i][j] = img[i - 1][j - 1];
}
}
return padded_image;
}
How can i vectorize function above using NEON intrinsics in C ?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
弹出了两件事。
如果可能,请使用连续的内存分配。
计划如何实现多余的数据宽度(SIMD宽度的非媒介)。我通常使用重叠的SIMD寄存器的方法:
在每个侧面提示您都将使用一些3x3内核过滤图像的边距1,但是即使没有明确的余量,也可以有效地进行过滤。
该碎片只处理了一排,但当然可以扩展到三个指针中读取。
Two things pop up.
If possible, use contiguous memory allocation.
Plan how you are going to implement excess data widths (non-multiples of SIMD width). I typically use the method of overlapped SIMD registers:
A margin of 1 at every side hints that you are going to filter the image using some 3x3 kernel, but the filtering can be done efficiently even without the explicit margin.
That fragment only handled one row, but it can of course be extended to read from three pointers.