通过任意因素进行调整大小的最快调整方法(重新缩放)

发布于 2025-01-29 02:20:14 字数 925 浏览 1 评论 0原文

我有以下代码,可以用类似方式调整1D矢量,并以类似的方式调整图像大小。另一个术语将是重新采样,但是这些术语似乎有很多混乱(重新采样也是统计技术中的一种技术),因此我更喜欢描述性。

目前,代码看起来像这样,我需要优化它:

inline void resizeNearestNeighbor(const int16_t* current, uint32_t currentSize, int16_t* out, uint32_t newSize, uint32_t offset = 0u)
{
    if(currentSize == newSize)
    {
        return;
    }

    const float scaleFactor = static_cast<float>(currentSize) / static_cast<float>(newSize);
    for(uint32_t outIdx = 0; outIdx<newSize; ++outIdx)
    {
        const int currentIdx = static_cast<uint32_t>(outIdx * scaleFactor);
        out[outIdx] = current[(currentIdx + offset)%currentSize];
    }
}

当然,这不是很有效的,因为通过降落来采用浮子整数的操作很昂贵,而且我认为它在此中没有任何好处案件。该平台是Cortex M7,因此,如果您熟悉此平台上的任何矢量化技术,它也将非常有帮助。

此代码的用例是声音效果,可以平稳更改延迟线的长度(因此,由于它是环形缓冲区,因此额外的偏移参数)。能够平稳地更改延迟线的长度听起来像是放慢速度或加速录音机中的播放,只是它在循环中。没有这种缩放,就会有很多点击的声音和文物。目前,硬件与所有DSP和此代码都在努力,并且它不能实时重新分组长延迟线。

I have the following code that does the resizing of a 1D vector with nearest neighbor interpolation in a similar fashion you'd also resize an image. Another term would be resampling, but there seems to be a lot of confusion around these terms (resampling is also a technique in statistics), so I prefer to be more descriptive.

Currently the code looks like this and I need to optimize it:

inline void resizeNearestNeighbor(const int16_t* current, uint32_t currentSize, int16_t* out, uint32_t newSize, uint32_t offset = 0u)
{
    if(currentSize == newSize)
    {
        return;
    }

    const float scaleFactor = static_cast<float>(currentSize) / static_cast<float>(newSize);
    for(uint32_t outIdx = 0; outIdx<newSize; ++outIdx)
    {
        const int currentIdx = static_cast<uint32_t>(outIdx * scaleFactor);
        out[outIdx] = current[(currentIdx + offset)%currentSize];
    }
}

This of course is not hugely efficient because the operation to take the integer part of a float by downcasting is expensive and I don't think it can take any benefit of vectorization in this case. The platform is Cortex M7, so if you're familiar with any vectorization techniques on this platform, it would be also very helpful.

The use case of this code is a sound effect that allows for smoothly changing the length of a delay line (hence the additional offset parameter, since it's a ring buffer). Being able to smoothly change the length of a delay line sounds like slowing down or speeding up playback in a tape recorder, only it's in a loop. Without this scaling, there are lots of clicking noises and artifacts. Currently the hardware struggles with all the DSP and this code on top of that and it can't rescale long delay lines in real time.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

与往事干杯 2025-02-05 02:20:14

由于Cortex-M系列非常有限(即使是M7中的浮点也是可选的),因此我将估计使用Bresenham的Mid Point Line Drawing算法,我将估计合理的加速。

该算法始终基于错误项的符号来推进N或N+1个元素。模量不需要全长划分:足以计算currentIdx + = n +(delta&lt; 0);如果(currentIdx&gt; = currentize)currentIdx- = currentsize;

也可以以的形式进行“试用部”,if(currentIdx + 64 *(n + 1)&lt; currentsize)确保接下来的64个元素不需要模块化减少。 M7具有一个乘法单元,但是通过换档乘以乘以更快的微观化。

bresenham的算法对于线图,

plotLine(x0, y0, x1, y1)
   dx = x1 - x0
   dy = y1 - y0
   D = 2*dy - dx
   y = y0

   for x from x0 to x1
       plot(x,y)
       if D > 0
           y = y + 1
           D = D - 2*dx
       end if
       D = D + 2*dy

您的应用程序是您的应用程序的,您Y0,Y1,但取而代之的是直接具有dy = input_sizedx = output_size

resample(dx, dy)
   N = dy/dx
   dy = dy % dx
   D = 2*dy - dx;
   y = offset;
   for x from 0 to dx-1
       out[x] = in[y]
       y += N
       if D > 0
          y = y + 1
          D = D - 2*dx
       end if
       D = D + 2*dy
       if (y >= currentSize)
           y -= currentSize

通过n&gt; 0步骤推进y的关键修改是dy = dy = dy%dx以正确计算错误。

int scale = 65536 * newSize / currentSize;
int y = offset << 16;
for (int x = 0; x < newSize; x++) {
    out[x] = in[y >> 16];
    y += scale;
    if (y >= (currentSize << 16))
        y -= (currentSize << 16);
} 

Since the Cortex-M series is quite limited (even floating point in M7 is optional), I would estimate a reasonable speed-up coming from using Bresenham's mid point line drawing algorithm.

This algorithm always advances either N or N+1 elements based on the sign of the error term. The modulus does not need full length division: it suffices to compute currentIdx += N + (delta < 0); if (currentIdx >= currentSize) currentIdx -= currentSize;

One can also make a "trial divisions" in form of if (currentIdx + 64 * (N+1) < currentSize) to ensure that the next 64 elements do not need modular reduction. M7 has a multiplication unit, but multiplying by shifting is still likely a faster micro-optimisation.

The Bresenham's algorithm for line drawing is of form

plotLine(x0, y0, x1, y1)
   dx = x1 - x0
   dy = y1 - y0
   D = 2*dy - dx
   y = y0

   for x from x0 to x1
       plot(x,y)
       if D > 0
           y = y + 1
           D = D - 2*dx
       end if
       D = D + 2*dy

Your application does not have x0,x1,y0,y1, but instead it has directly dy = input_size, dx = output_size.

resample(dx, dy)
   N = dy/dx
   dy = dy % dx
   D = 2*dy - dx;
   y = offset;
   for x from 0 to dx-1
       out[x] = in[y]
       y += N
       if D > 0
          y = y + 1
          D = D - 2*dx
       end if
       D = D + 2*dy
       if (y >= currentSize)
           y -= currentSize

The crucial modification to advance y by N>0 steps is to dy = dy % dx to get the error computation correct.

One can also use slightly less accurate fixed point DDA algorithm with

int scale = 65536 * newSize / currentSize;
int y = offset << 16;
for (int x = 0; x < newSize; x++) {
    out[x] = in[y >> 16];
    y += scale;
    if (y >= (currentSize << 16))
        y -= (currentSize << 16);
} 
半世晨晓 2025-02-05 02:20:14

如果您查看currentIdx,您会注意到它每次都会被scalefactor每次OUTIDX添加。因此,您可以用OUTIDX * scalfactor currentIdx += scale -factor

您将currentIdx offset初始化,因此也从循环中悬挂。

%currentsize也是一个昂贵的操作,并且似乎仅适用于非零偏移案例。您可能需要以不同的方式对处理,然后将循环分为两个循环(包裹点之前/之后)。

If you look at currentIdx, you'll note that it is incremented by scaleFactor every time outIdx is incremented by one. Hence, you can replace outIdx * scaleFactor with currentIdx += scaleFactor.

You'd initialize currentIdx to offset, so that's hoisted from the loop as well.

%currentSize is an expensive operation as well, and one that appears to exist only for the non-zero offset case. You might want to treat that differently, and split the loop in two loops (before/after wrap-around point).

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文