缩小 32 位 RGB 图像的最快算法

发布于 2024-08-10 17:20:38 字数 223 浏览 9 评论 0原文

使用哪种算法将 32 位 RGB 图像缩小到自定义分辨率?算法应该平均像素。

例如,如果我有 100x100 的图像,并且我想要尺寸为 20x50 的新图像。第一个源行的前五个像素的平均值将给出目标的第一个像素,第一个源列的前两个像素的平均值将给出第一个目标列像素。

目前我所做的是首先缩小 X 分辨率,然后缩小 Y 分辨率。我在此方法中需要一个临时缓冲区。

您知道有什么优化方法吗?

which algorithm to use to scale down 32Bit RGB IMAGE to custom resolution? Algorithm should average pixels.

for example If I have 100x100 image and I want new Image of size 20x50. Avg of first five pixels of first source row will give first pixel of dest, And avg of first two pixels of first source column will give first dest column pixel.

Currently what I do is first scale down in X resolution, and after that I scale down in Y resolution. I need one temp buffer in this method.

Is there any optimized method that you know?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(8

我做我的改变 2024-08-17 17:20:38

您正在寻找的术语是“重新采样”。在您的情况下,您需要图像重新采样。您似乎已经在进行线性插值,这应该是最快的。这里有大约 6 种基本算法。如果您确实想深入研究该主题,请查看“重采样内核”。

The term you are looking for is "Resampling." In your case you want image resampling. You seem to already be doing linear interpolation, which should be the fastest. Here are ~6 base algorithms. If you really want to delve into the subject look into "resampling kernels."

梦年海沫深 2024-08-17 17:20:38

完成标准 C 优化(指针算术、定点数学等...)
还有一些更巧妙的优化。 (非常)很久以前,我看到一个首先缩放 X 方向的缩放器实现。在写出水平缩放图像的过程中,它在内存中将图像旋转了 90 度。这样,当需要读取 Y 方向刻度时,内存中的数据会更好地缓存对齐。

该技术在很大程度上取决于它将运行的处理器。

After you do the standard C optimizations (pointer arithmetic, fixed point math, etc...)
There are also some more clever optimizations to be had. A (very) long time ago, I saw an scaler implementation that scaled the X direction first. In the process of writing out the horizontally scaled image, it rotated the image 90degrees in memory. This was so that when it came time to do the reads for the Y direction scale, the data in memory would be better cache aligned.

This technique depends heavily on the processor that it will run on.

混浊又暗下来 2024-08-17 17:20:38

这对适当的像素进行平均。

 w_ratio = src.w / dest.w
 h_ratio = src.h / dest.h

 dest[x,y] = 
    AVG( src[x * w_ratio + xi, y * h_ratio + yi] ) 
      where
           xi in range (0, w_ratio - 1), inc by 1
           yi in range (0, h_ratio - 1), inc by 1

对于边界条件,执行单独的循环(循环中没有 if )。

这是一个更像 C 的代码:

src 和 dest 是位图:
* 像素属性 src[x,y]
* 宽度属性 src.w
* 高度

像素的属性 src.h 已定义,以便

为简单起见添加

p1 = p1 + p2     
is same as
p1.r = p1.r + p2.r
p1.g = p1.g + p2.g
...

除法

p1 = p1 / c
p1.r = p1.r / c
p1.g = p1.g / c

常量 0 的

p1 = 0
p1.r = 0
p1.g = 0
...

评估,当像素分量整数溢出时我不会考虑问题...

float w_ratio = src.w / dest.w;
float h_ratio = src.h / dest.h;
int w_ratio_i = floor(w_ratio);
int h_ratio_i = floor(h_ratio);

wxh = w_ratio*h_ratio;

for (y = 0; y < dest.w; y++)
for (x = 0; x < dest.h; x++){
    pixel temp = 0;     

    int srcx, srcy;
    // we have to use here the floating point value w_ratio, h_ratio
    // otherwise towards the end it can get a little wrong
    // this multiplication can be optimized similarly to Bresenham's line
    srcx = floor(x * w_ratio);
    srcy = floor(y * h_ratio);

    // here we use floored value otherwise it might overflow src bitmap
    for(yi = 0; yi < h_ratio_i; yi++)
    for(xi = 0; xi < w_ratio_i; xi++)
            temp += src[srcx + xi, srcy + yi];
    dest[x,y] = temp / wxh;
}

Bresenham 线路优化

This averages the appropriate pixels.

 w_ratio = src.w / dest.w
 h_ratio = src.h / dest.h

 dest[x,y] = 
    AVG( src[x * w_ratio + xi, y * h_ratio + yi] ) 
      where
           xi in range (0, w_ratio - 1), inc by 1
           yi in range (0, h_ratio - 1), inc by 1

For boundary conditions do a separate loop (no if's in loop).

Here's a more C like code:

src and dest are bitmaps that:
* property src[x,y] for pixel
* property src.w for width
* property src.h for height

pixel has been defined so that

adding

p1 = p1 + p2     
is same as
p1.r = p1.r + p2.r
p1.g = p1.g + p2.g
...

division

p1 = p1 / c
p1.r = p1.r / c
p1.g = p1.g / c

evaluation with a constant 0

p1 = 0
p1.r = 0
p1.g = 0
...

for simplicity sake I won't consider the problem when pixel component integer overflows...

float w_ratio = src.w / dest.w;
float h_ratio = src.h / dest.h;
int w_ratio_i = floor(w_ratio);
int h_ratio_i = floor(h_ratio);

wxh = w_ratio*h_ratio;

for (y = 0; y < dest.w; y++)
for (x = 0; x < dest.h; x++){
    pixel temp = 0;     

    int srcx, srcy;
    // we have to use here the floating point value w_ratio, h_ratio
    // otherwise towards the end it can get a little wrong
    // this multiplication can be optimized similarly to Bresenham's line
    srcx = floor(x * w_ratio);
    srcy = floor(y * h_ratio);

    // here we use floored value otherwise it might overflow src bitmap
    for(yi = 0; yi < h_ratio_i; yi++)
    for(xi = 0; xi < w_ratio_i; xi++)
            temp += src[srcx + xi, srcy + yi];
    dest[x,y] = temp / wxh;
}

Bresenham's line optimization

烟酉 2024-08-17 17:20:38

您忘记提及问题中最重要的方面:您对质量的关心程度。如果您不确切地关心源像素的值如何组合在一起以创建目标像素,则最快的像素(至少在几乎所有情况下)会产生最差的质量。

如果您想回答“仍然能产生非常好的质量的最快算法”,那么您基本上已经涵盖了仅处理图像采样/调整大小的整个算法领域。

您已经概述了该算法的初步想法:

第一个的前五个像素的平均值
源行将给出第一个像素
目的地,

计算源像素上每个通道的平均值可能被视为微不足道,您是否正在寻找执行此操作的示例代码?

或者您是否正在寻找有人用更快的东西来挑战您的算法初稿?

You forget to mention the most important aspect of the question: how much you care about quality. If you dont care exactly how the values of the sources pixels are smashed together to create the destination pixel the fastest is (at least in almost all cases) the one that produces the worst quality.

If youre tempted to respond with "the fastest algorithm that still yields very good quality" you have essentially covered the entire algorithm field that deals with just imagesampling/resizing.

And you already outlined your initial idea of the algorithm:

Avg of first five pixels of first
source row will give first pixel of
dest,

Calculating the average value for each channel on the source pixels could be seen as trivial, are you looking for example code that does that?

Or are you looking for someone to challenge your initial draft of the algorithm with something even faster?

念﹏祤嫣 2024-08-17 17:20:38

If you're looking for a wordy explanation, I've found this article to be helpful. If on the other hand you deal more in mathematical formulae, there is a method of fast image downscaling explained here.

南烟 2024-08-17 17:20:38

这确实是速度/质量的权衡。

首先,你是正确的,先做一个维度,然后做另一个维度比它必须的要慢。内存读写次数过多。

您的重要选择是是否支持分数像素。您的示例是 100x100 到 20x50。因此 10 像素映射为 1。如果您要从 100x100 变为 21x49 该怎么办?您愿意在源像素边界进行操作,还是想拉入分数像素?对于 100x100 到 99x99 你会做什么?

您必须告诉我们您愿意接受什么,然后我们才能说出什么是最快的。

并告诉我们收缩可能出现的极端情况。源和目的地之间的差异可能有多少个数量级?在某些时候,对源内的代表性像素进行采样不会比对所有像素进行平均差很多。但是您必须小心选择代表性像素,否则您会因许多常见模式而出现锯齿。

It really is a speed/quality trade-off.

First of all, you're correct that doing one dimension then the other is slower than it has to be. Way too many memory reads and writes.

Your big choice is whether to support fractional pixels or not. Your example is 100x100 to 20x50. So 10 pixels map to 1. What if you're going from 100x100 to 21x49? Are you willing to operate at source pixel boundaries, or do you want to pull fractional pixels in? What would you do for 100x100 to 99x99?

You have to tell us what you're willing to accept before we can say what's fastest.

And also tell us the possible extremes of the shrinkage. How many orders of magnitude might the difference between the source and destination be? At some point, sampling representative pixels inside the source are won't be much worse than averaging all the pixels. But you'll have to be careful in choosing representative pixels or you'll get aliasing with many common patterns.

冬天的雪花 2024-08-17 17:20:38

您正在做的优化的方法。唯一更快的称为最近邻,您只需抓取范围的中间像素,而无需尝试对其中任何像素进行平均。如果原始图像中存在任何细节,则质量会明显变差,但如果原始图像很简单,则质量可能是可以接受的。

What you're doing is the optimized method. The only faster one is called nearest neighbor, where you simply grab the middle pixel of the range without trying to average any of them. The quality is significantly worse if there is any detail in the original image, although it might be acceptable if the original is simple.

甜`诱少女 2024-08-17 17:20:38

这就是您在 C 中寻找的东西。它是用 C 实现的 Egons 方法,并针对速度进行了优化。 Alpha 通道被忽略并设置为 0,但这可以轻松更改。将两个内部循环包装在 Duffs-Loop 中只是为了提高性能 - 如果需要,可以用普通的 for 循环替换 Duffs-Loops。

参数:dst和src是指向32位像素数据的指针,dst_pitch和src_pitch是一条扫描线的长度(以字节为单位),src_width和src_height是以像素为单位的源图像的宽度和高度,factor_x和factor_y是缩放分母x 和 y 方向。

成功时返回 0,失败时返回 -1。

#define DUFFS_LOOP(pixel_copy_increment, width) \
{ int n = (width+7)/8;                          \
    switch (width & 7) {                        \
    case 0: do {    pixel_copy_increment;       \
    case 7:     pixel_copy_increment;           \
    case 6:     pixel_copy_increment;           \
    case 5:     pixel_copy_increment;           \
    case 4:     pixel_copy_increment;           \
    case 3:     pixel_copy_increment;           \
    case 2:     pixel_copy_increment;           \
    case 1:     pixel_copy_increment;           \
        } while ( --n > 0 );                    \
    }                                           \
}

int fastscale(unsigned char *dst, int dst_pitch, unsigned char *src, int src_width, int src_height, int src_pitch, int factor_x, int factor_y)
{
    if (factor_x < 1 || factor_y < 1) return -1;

    int temp_r, temp_g, temp_b;
    int i1,i2;

    int dst_width = src_width / factor_x;
    int dst_height = src_height / factor_y;
    if (!dst_height || !dst_width) return -1;
    int factors_mul = factor_x * factor_y;
    int factorx_mul4 = factor_x << 2;
    int src_skip1 = src_pitch - factorx_mul4;
    int src_skip2 = factorx_mul4 - factor_y * src_pitch;
    int src_skip3 = src_pitch * factor_y - dst_width * factorx_mul4;
    int dst_skip = dst_pitch - (dst_width << 2);

    for (i1 = 0; i1 < dst_height; ++i1)
    {
        for (i2 = 0; i2 < dst_width; ++i2)
        {
            temp_r = temp_g = temp_b = 0;
            DUFFS_LOOP ({
                DUFFS_LOOP ({
                    src++; // alpha
                    temp_r += *(src++);
                    temp_g += *(src++);
                    temp_b += *(src++);
                }, factor_x);
                src += src_skip1;
            }, factor_y);
            *(dst++) = 0; // alpha
            *(dst++) = temp_r / factors_mul;
            *(dst++) = temp_g / factors_mul;
            *(dst++) = temp_b / factors_mul;
            src += src_skip2;
        }
        dst += dst_skip;
        src += src_skip3;
    }
    return 0;
}

This is what you are looking for in C. It is Egons approach implemented in C and optimized for speed. Alpha channel is ignored and set to 0, but this can be easily changed. Wrapping the two inner loops in a Duffs-Loop is only for performance - the Duffs-Loops can be replaced by a normal for-loop if desired.

Parameters: dst and src are pointers to the 32-bit pixel data, dst_pitch and src_pitch are the lengths of one scanline in bytes, src_width and src_height are the width and height of the source image in pixels, factor_x and factor_y are the scaling denominators in x- and y-directions.

Returns 0 on success and -1 on failure.

#define DUFFS_LOOP(pixel_copy_increment, width) \
{ int n = (width+7)/8;                          \
    switch (width & 7) {                        \
    case 0: do {    pixel_copy_increment;       \
    case 7:     pixel_copy_increment;           \
    case 6:     pixel_copy_increment;           \
    case 5:     pixel_copy_increment;           \
    case 4:     pixel_copy_increment;           \
    case 3:     pixel_copy_increment;           \
    case 2:     pixel_copy_increment;           \
    case 1:     pixel_copy_increment;           \
        } while ( --n > 0 );                    \
    }                                           \
}

int fastscale(unsigned char *dst, int dst_pitch, unsigned char *src, int src_width, int src_height, int src_pitch, int factor_x, int factor_y)
{
    if (factor_x < 1 || factor_y < 1) return -1;

    int temp_r, temp_g, temp_b;
    int i1,i2;

    int dst_width = src_width / factor_x;
    int dst_height = src_height / factor_y;
    if (!dst_height || !dst_width) return -1;
    int factors_mul = factor_x * factor_y;
    int factorx_mul4 = factor_x << 2;
    int src_skip1 = src_pitch - factorx_mul4;
    int src_skip2 = factorx_mul4 - factor_y * src_pitch;
    int src_skip3 = src_pitch * factor_y - dst_width * factorx_mul4;
    int dst_skip = dst_pitch - (dst_width << 2);

    for (i1 = 0; i1 < dst_height; ++i1)
    {
        for (i2 = 0; i2 < dst_width; ++i2)
        {
            temp_r = temp_g = temp_b = 0;
            DUFFS_LOOP ({
                DUFFS_LOOP ({
                    src++; // alpha
                    temp_r += *(src++);
                    temp_g += *(src++);
                    temp_b += *(src++);
                }, factor_x);
                src += src_skip1;
            }, factor_y);
            *(dst++) = 0; // alpha
            *(dst++) = temp_r / factors_mul;
            *(dst++) = temp_g / factors_mul;
            *(dst++) = temp_b / factors_mul;
            src += src_skip2;
        }
        dst += dst_skip;
        src += src_skip3;
    }
    return 0;
}
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文