如何对重新采样的音频数据进行双三次(或其他非线性)插值?

发布于 2024-07-27 18:47:20 字数 2150 浏览 19 评论 0原文

我正在编写一些以不同速度播放 WAV 文件的代码,以便波形要么更慢、音调更低,要么更快、音调更高。 我目前正在使用简单的线性插值,如下所示:

            int newlength = (int)Math.Round(rawdata.Length * lengthMultiplier);
            float[] output = new float[newlength];

            for (int i = 0; i < newlength; i++)
            {
                float realPos = i / lengthMultiplier;
                int iLow = (int)realPos;
                int iHigh = iLow + 1;
                float remainder = realPos - (float)iLow;

                float lowval = 0;
                float highval = 0;
                if ((iLow >= 0) && (iLow < rawdata.Length))
                {
                    lowval = rawdata[iLow];
                }
                if ((iHigh >= 0) && (iHigh < rawdata.Length))
                {
                    highval = rawdata[iHigh];
                }

                output[i] = (highval * remainder) + (lowval * (1 - remainder));
            }

这工作正常,但只有当我降低播放频率(即减慢播放速度)时,它听起来才正常。 如果我在播放时提高音调,此方法往往会产生高频伪影,可能是因为样本信息丢失。

我知道双三次插值方法和其他插值方法不仅仅使用两个最近的样本值进行重新采样,如我的代码示例中所示,但我找不到任何可以插入来替换我的线性插值方法的好的代码示例(最好是 C#) 。

有谁知道任何好的例子,或者有人可以编写一个简单的双三次插值方法吗? 如果有必要的话我会悬赏这个。 :)

更新:这里有几个插值方法的 C# 实现(第一个感谢 Donnie DeBoer,第二个感谢 nosredna):

    public static float InterpolateCubic(float x0, float x1, float x2, float x3, float t)
    {
        float a0, a1, a2, a3;
        a0 = x3 - x2 - x0 + x1;
        a1 = x0 - x1 - a0;
        a2 = x2 - x0;
        a3 = x1;
        return (a0 * (t * t * t)) + (a1 * (t * t)) + (a2 * t) + (a3);
    }

    public static float InterpolateHermite4pt3oX(float x0, float x1, float x2, float x3, float t)
    {
        float c0 = x1;
        float c1 = .5F * (x2 - x0);
        float c2 = x0 - (2.5F * x1) + (2 * x2) - (.5F * x3);
        float c3 = (.5F * (x3 - x0)) + (1.5F * (x1 - x2));
        return (((((c3 * t) + c2) * t) + c1) * t) + c0;
    }

在这些函数中,x1 是点之前的样本值您正在尝试估计,x2 是您的点之后的样本值。 x0 位于 x1 的左侧,x3 位于 x2 的右侧。 t 从 0 到 1,是您估计的点与 x1 点之间的距离。

Hermite 方法似乎效果很好,并且似乎在一定程度上减少了噪声。 更重要的是,当波浪加速时,听起来似乎更好。

I'm writing some code that plays back WAV files at different speeds, so that the wave is either slower and lower-pitched, or faster and higher-pitched. I'm currently using simple linear interpolation, like so:

            int newlength = (int)Math.Round(rawdata.Length * lengthMultiplier);
            float[] output = new float[newlength];

            for (int i = 0; i < newlength; i++)
            {
                float realPos = i / lengthMultiplier;
                int iLow = (int)realPos;
                int iHigh = iLow + 1;
                float remainder = realPos - (float)iLow;

                float lowval = 0;
                float highval = 0;
                if ((iLow >= 0) && (iLow < rawdata.Length))
                {
                    lowval = rawdata[iLow];
                }
                if ((iHigh >= 0) && (iHigh < rawdata.Length))
                {
                    highval = rawdata[iHigh];
                }

                output[i] = (highval * remainder) + (lowval * (1 - remainder));
            }

This works fine, but it tends to sound OK only when I lower the frequency of the playback (i.e. slow it down). If I raise the pitch on playback, this method tends to produce high-frequency artifacts, presumably because of the loss of sample information.

I know that bicubic and other interpolation methods resample using more than just the two nearest sample values as in my code example, but I can't find any good code samples (C# preferably) that I could plug in to replace my linear interpolation method here.

Does anyone know of any good examples, or can anyone write a simple bicubic interpolation method? I'll bounty this if I have to. :)

Update: here are a couple of C# implementations of interpolation methods (thanks to Donnie DeBoer for the first one and nosredna for the second):

    public static float InterpolateCubic(float x0, float x1, float x2, float x3, float t)
    {
        float a0, a1, a2, a3;
        a0 = x3 - x2 - x0 + x1;
        a1 = x0 - x1 - a0;
        a2 = x2 - x0;
        a3 = x1;
        return (a0 * (t * t * t)) + (a1 * (t * t)) + (a2 * t) + (a3);
    }

    public static float InterpolateHermite4pt3oX(float x0, float x1, float x2, float x3, float t)
    {
        float c0 = x1;
        float c1 = .5F * (x2 - x0);
        float c2 = x0 - (2.5F * x1) + (2 * x2) - (.5F * x3);
        float c3 = (.5F * (x3 - x0)) + (1.5F * (x1 - x2));
        return (((((c3 * t) + c2) * t) + c1) * t) + c0;
    }

In these functions, x1 is the sample value ahead of the point you're trying to estimate and x2 is the sample value after your point. x0 is left of x1, and x3 is right of x2. t goes from 0 to 1 and is the distance between the point you're estimating and the x1 point.

The Hermite method seems to work pretty well, and appears to reduce the noise somewhat. More importantly it seems to sound better when the wave is sped up.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

星軌x 2024-08-03 18:47:20

我最喜欢的音频插值资源(尤其是在重采样应用程序中)是 Olli Niemitalo 的“大象”论文

我已经使用了其中的几个,它们听起来棒极了(比相对嘈杂的直接三次解要好得多)。 有样条形式、Hermite形式、瓦特形式、抛物线形式等。并且从音频的角度讨论它们。 这不仅仅是典型的朴素多项式拟合。

并且包含代码!

要决定使用哪个,您可能需要从第 60 页的表格开始,该表格将算法按运算符复杂度(乘法次数和加法次数)进行分组。 然后选择最佳的信噪比解决方案——用您的耳朵作为指导做出最终选择。 注意:一般来说,SNR越高越好。

My favorite resource for audio interpolating (especially in resampling applications) is Olli Niemitalo's "Elephant" paper.

I've used a couple of these and they sound terrific (much better than a straight cubic solution, which is relatively noisy). There are spline forms, Hermite forms, Watte, parabolic, etc. And they are discussed from an audio point-of-view. This is not just your typical naive polynomial fitting.

And code is included!

To decide which to use, you probably want to start with the table on page 60 which groups the algorithms into operator complexity (how many multiplies, and how many adds). Then choose among the best signal-to-noise solutions--use your ears as a guide to make the final choice. Note: Generally, the higher SNR, the better.

混浊又暗下来 2024-08-03 18:47:20
double InterpCubic(double x0, double x1, double x2, double x3, double t)
{
   double a0, a1, a2, a3;

   a0 = x3 - x2 - x0 + x1;
   a1 = x0 - x1 - a0;
   a2 = x2 - x0;
   a3 = x1;

   return a0*(t^3) + a1*(t^2) + a2*t + a3;
}

其中 x1 和 x2 是在其间插值的样本,x0 是 x1 的左邻居,x3 是 x2 的右邻居。 t为[0, 1],表示x1和x2之间的插值位置。

double InterpCubic(double x0, double x1, double x2, double x3, double t)
{
   double a0, a1, a2, a3;

   a0 = x3 - x2 - x0 + x1;
   a1 = x0 - x1 - a0;
   a2 = x2 - x0;
   a3 = x1;

   return a0*(t^3) + a1*(t^2) + a2*t + a3;
}

where x1 and x2 are the samples being interpolated between, x0 is x1's left neighbor, and x3 is x2's right neighbor. t is [0, 1], denoting the interpolation position between x1 and x2.

像你 2024-08-03 18:47:20

老实说,对于音频来说,三次插值通常并不比线性插值好多少。 改进线性插值的一个简单建议是使用抗混叠滤波器(插值之前或之后,取决于您是缩短信号还是延长信号)。 另一种选择(尽管计算成本更高)是正弦插值,它可以以非常高的质量完成。

我们发布了一些简单的 LGPL 重采样代码,这些代码可以作为 WDL 的一部分执行这两项操作(请参阅 resample 。H)。

Honestly, cubic interpolation isn't generally much better for audio than linear. A simple suggestion for improving your linear interpolation would be to use an antialiasing filter (before or after the interpolation, depending on whether you are shortening the signal or lengthening it). Another option (though more computationally expensive) is sinc-interpolation, which can be done with very high quality.

We have released some simple, LGPL resampling code that can do both of these as part of WDL (see resample.h).

终遇你 2024-08-03 18:47:20

您正在寻找多项式插值。 这个想法是,您在要插值的点周围选取多个已知数据点,使用这些数据点计算插值多项式,然后找出多项式的值和插值点。

还有其他方法。 如果您能忍受数学,请查看信号重建,或谷歌搜索“信号插值”。

You're looking for polynomial interpolation. The idea is that you pick a number of known data points around the point you want to interpolate, compute an interpolated polynomial using the data points, and then find out the value of the polynomial and the interpolation point.

There are other methods. If you can stomach the math, look at signal reconstruction, or google for "signal interpolation".

情何以堪。 2024-08-03 18:47:20

我没有足够的声誉来评论唐尼的答案,所以我希望除了回答问题之外至少部分引用它。 我共同维护了 Godot 引擎的音频系统,该系统使用了与该答案中提供的多项式相同的系数一段时间,只是想指出该代码片段中的系数是错误的。 给出的代码有一些非常严重的伪影,尤其是在低频音频中。 就此而言,Nosredna 的回答中至少有一个算法给出了 有一些相当严重的低-通过。 Godot 已切换回简单的三次重采样方案,并且它似乎运行良好大多数用户。

我强烈建议使用具有以下多项式的三次重采样:

a0 = 3 * y1 - 3 * y2 + y3 - y0;
a1 = 2 * y0 - 5 * y1 + 4 * y2 - y3;
a2 = y2 - y0;
a3 = 2 * y1;

out = (a0 * mu * mu2 + a1 * mu2 + a2 * mu + a3) / 2;

其中 y0y1y2y3 是连续的原始音频中的样本,从最早到最晚,mu 是样本中时间的小数部分,mu2mu 的平方(如果编译器无法正确优化,它的存在完全是为了保存乘法)。

数学超出了我的能力,但这些系数在 Godot 中已经运行良好一段时间了,没有用户抱怨。

I don't have enough reputation to comment on Donnie's answer so I hope it's okay if I at least partially reference it here in addition to answering the question. I co-maintain the Godot engine's audio system which used the same coefficients as the polynomial provided in that answer for a while and just wanted to throw it out there that the coefficients are wrong in that code snippet. The code given has some pretty severe artifacts that show up especially with low-frequency audio. And for that matter, at least one of the algorithms in the paper Nosredna's answer gives have some pretty severe low-pass. Godot has switched back to a simple cubic resampling scheme and it seems to be working well for most users.

I highly recommend using cubic resampling with the following polynomial:

a0 = 3 * y1 - 3 * y2 + y3 - y0;
a1 = 2 * y0 - 5 * y1 + 4 * y2 - y3;
a2 = y2 - y0;
a3 = 2 * y1;

out = (a0 * mu * mu2 + a1 * mu2 + a2 * mu + a3) / 2;

Where y0, y1, y2, and y3 are successive samples in the original audio, from earliest to latest, mu is the fractional component of the time in samples, and mu2 is the square of mu (which exists entirely to save a multiplication if the compiler fails to optimize correctly).

The math is beyond me but these coefficients have been working well in Godot for some time now with no user complaints.

固执像三岁 2024-08-03 18:47:20

此版本的三次 Hermite 插值使用的指令比其他版本少,无论是否带有融合乘加。

float InterpolateHermite(float x0, float x1, float x2, float x3, float t)
{
    float diff = x1 - x2;
    float c1 = x2 - x0;
    float c3 = x3 - x0 + 3 * diff;
    float c2 = -(2 * diff + c1 + c3);
    return 0.5f * ((c3 * t + c2) * t + c1) * t + x1;
}

Hermite插值比Lagrange插值更快,但相位精度较差。 它们的幅度响应具有相同的范围,但 Hermite 在远离 0.5n 时更接近平坦。 Libsoxr 的低质量选项具有优化的三次拉格朗日插值器 cubic_stage_fn,在 LGPL 许可下,具有 6 次乘法、12 次加法。 这可以与 Olli Niemitalo 的拉格朗日代码进行比较,该代码有 9 次乘法,11 次加法。

Hermite 也在 Nosredna 的 InterpolateHermite4pt3oX、Olli Niemitalo 的 2001 年论文以及 Ellen Poe 的回答中在此页面上实现。

This version of cubic Hermite interpolation uses fewer instructions than the others, both with and without Fused Multiply-Add.

float InterpolateHermite(float x0, float x1, float x2, float x3, float t)
{
    float diff = x1 - x2;
    float c1 = x2 - x0;
    float c3 = x3 - x0 + 3 * diff;
    float c2 = -(2 * diff + c1 + c3);
    return 0.5f * ((c3 * t + c2) * t + c1) * t + x1;
}

Hermite interpolation is faster than Lagrange interpolation but has worse phase accuracy. Their amplitude responses have the same range, but Hermite is closer to flat away from 0.5n. Libsoxr's low-quality option has an optimized cubic Lagrange interpolator cubic_stage_fn, under LGPL license, with 6 multiplies, 12 adds. This can be compared to Olli Niemitalo's Lagrange code, which has 9 multiplies, 11 adds.

Hermite is also implemented on this page in Nosredna's InterpolateHermite4pt3oX, in Olli Niemitalo's 2001 paper, and in Ellen Poe's answer.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文