使用帧之间的相位变化从 FFT Bin 中提取精确频率

发布于 2024-10-10 15:55:42 字数 3533 浏览 0 评论 0原文

我一直在浏览这篇精彩的文章:http://blogs。 zynaptiq.com/bernsee/pitch-shifting-using-the-ft/

虽然很精彩,但它却非常困难和繁重。这种材料真的让我很紧张。

我从 Stefan 的代码模块中提取了数学知识,该模块计算给定 bin 的确切频率。但我不明白最后的计算。有人能给我解释一下最后的数学结构吗?

在深入研究代码之前,让我先设置一下场景:

  • 假设我们设置 fftFrameSize = 1024,因此我们正在处理 512+1 个 bin

  • 例如,Bin[1] 的理想频率适合帧中的单个波。在 40KHz 采样率下,tOneFrame = 1024/40K 秒 = 1/40s,因此 Bin[1] 理想情况下会收集 40Hz 信号。

  • 设置 osamp (overSample) = 4,我们以 256 的步长沿着输入信号前进。因此,第一个分析检查字节 0 到 1023,然后检查 256 到 1279,等等。请注意,每个浮点都被处理 4 次。

...

void calcBins( 
              long fftFrameSize, 
              long osamp, 
              float sampleRate, 
              float * floats, 
              BIN * bins
              )
{
    /* initialize our static arrays */
    static float gFFTworksp[2*MAX_FRAME_LENGTH];
    static float gLastPhase[MAX_FRAME_LENGTH/2+1];

    static long gInit = 0;
    if (! gInit) 
    {
        memset(gFFTworksp, 0, 2*MAX_FRAME_LENGTH*sizeof(float));
        memset(gLastPhase, 0, (MAX_FRAME_LENGTH/2+1)*sizeof(float));
        gInit = 1;
    }

    /* do windowing and re,im interleave */
    for (long k = 0; k < fftFrameSize; k++) 
    {
        double window = -.5*cos(2.*M_PI*(double)k/(double)fftFrameSize)+.5;
        gFFTworksp[2*k] = floats[k] * window;
        printf("sinValue: %f", gFFTworksp[2*k]);
        gFFTworksp[2*k+1] = 0.;
    }

    /* do transform */
    smbFft(gFFTworksp, fftFrameSize, -1);

    printf("\n");

    /* this is the analysis step */
    for (long k = 0; k <= fftFrameSize/2; k++) 
    {
        /* de-interlace FFT buffer */
        double real = gFFTworksp[2*k];
        double imag = gFFTworksp[2*k+1];

        /* compute magnitude and phase */
        double magn = 2.*sqrt(real*real + imag*imag);
        double phase = atan2(imag,real);

        /* compute phase difference */
        double phaseDiff = phase - gLastPhase[k];
        gLastPhase[k] = phase;

        /* subtract expected phase difference */
        double binPhaseOffset = M_TWOPI * (double)k / (double)osamp;
        double deltaPhase = phaseDiff - binPhaseOffset;

        /* map delta phase into [-Pi, Pi) interval */
        // better, but obfuscatory...
        //    deltaPhase -= M_TWOPI * floor(deltaPhase / M_TWOPI + .5);

        while (deltaPhase >= M_PI)
            deltaPhase -= M_TWOPI;
        while (deltaPhase < -M_PI)
            deltaPhase += M_TWOPI;

(编辑:)现在我不明白的是:

        // Get deviation from bin frequency from the +/- Pi interval 
        // Compute the k-th partials' true frequency    

        // Start with bin's ideal frequency
        double bin0Freq = (double)sampleRate / (double)fftFrameSize;
        bins[k].idealFreq = (double)k * bin0Freq;

        // Add deltaFreq
        double sampleTime = 1. / (double)sampleRate;
        double samplesInStep = (double)fftFrameSize / (double)osamp;
        double stepTime = sampleTime * samplesInStep;
        double deltaTime = stepTime;        

        // Definition of frequency is rate of change of phase, i.e. f = dϕ/dt
        // double deltaPhaseUnit = deltaPhase / M_TWOPI; // range [-.5, .5)
        double freqAdjust = (1. / M_TWOPI) * deltaPhase / deltaTime; 

        // Actual freq <-- WHY ???
        bins[k].freq = bins[k].idealFreq + freqAdjust;
    }
}

我只是看不清它,即使它似乎在盯着脸。有人可以从头开始一步一步解释这个过程吗?

I have been looking through this fantastic article: http://blogs.zynaptiq.com/bernsee/pitch-shifting-using-the-ft/

While being fantastic, it is extremely hard and heavy going. This material is really stretching me.

I have extracted the maths from Stefan's code module that calculates the exact frequency for a given bin. But I don't understand the last calculation. Can someone explain to me the mathematical construction at the end?

Before digging into the code, let me set the scene:

  • Let's say we set fftFrameSize = 1024, so we are dealing with 512+1 bins

  • As an example, Bin[1]'s ideal frequency fits a single wave in the frame. At a sample rate of 40KHz, tOneFrame = 1024/40K seconds = 1/40s, so Bin[1] would ideally be collecting a 40Hz signal.

  • Setting osamp (overSample) = 4, we progress along our input signal in steps of 256. So the first analysis examines bytes zero through 1023, then 256 through 1279, etc. Note each float gets processed 4 times.

...

void calcBins( 
              long fftFrameSize, 
              long osamp, 
              float sampleRate, 
              float * floats, 
              BIN * bins
              )
{
    /* initialize our static arrays */
    static float gFFTworksp[2*MAX_FRAME_LENGTH];
    static float gLastPhase[MAX_FRAME_LENGTH/2+1];

    static long gInit = 0;
    if (! gInit) 
    {
        memset(gFFTworksp, 0, 2*MAX_FRAME_LENGTH*sizeof(float));
        memset(gLastPhase, 0, (MAX_FRAME_LENGTH/2+1)*sizeof(float));
        gInit = 1;
    }

    /* do windowing and re,im interleave */
    for (long k = 0; k < fftFrameSize; k++) 
    {
        double window = -.5*cos(2.*M_PI*(double)k/(double)fftFrameSize)+.5;
        gFFTworksp[2*k] = floats[k] * window;
        printf("sinValue: %f", gFFTworksp[2*k]);
        gFFTworksp[2*k+1] = 0.;
    }

    /* do transform */
    smbFft(gFFTworksp, fftFrameSize, -1);

    printf("\n");

    /* this is the analysis step */
    for (long k = 0; k <= fftFrameSize/2; k++) 
    {
        /* de-interlace FFT buffer */
        double real = gFFTworksp[2*k];
        double imag = gFFTworksp[2*k+1];

        /* compute magnitude and phase */
        double magn = 2.*sqrt(real*real + imag*imag);
        double phase = atan2(imag,real);

        /* compute phase difference */
        double phaseDiff = phase - gLastPhase[k];
        gLastPhase[k] = phase;

        /* subtract expected phase difference */
        double binPhaseOffset = M_TWOPI * (double)k / (double)osamp;
        double deltaPhase = phaseDiff - binPhaseOffset;

        /* map delta phase into [-Pi, Pi) interval */
        // better, but obfuscatory...
        //    deltaPhase -= M_TWOPI * floor(deltaPhase / M_TWOPI + .5);

        while (deltaPhase >= M_PI)
            deltaPhase -= M_TWOPI;
        while (deltaPhase < -M_PI)
            deltaPhase += M_TWOPI;

(EDIT:) Now the bit I don't get:

        // Get deviation from bin frequency from the +/- Pi interval 
        // Compute the k-th partials' true frequency    

        // Start with bin's ideal frequency
        double bin0Freq = (double)sampleRate / (double)fftFrameSize;
        bins[k].idealFreq = (double)k * bin0Freq;

        // Add deltaFreq
        double sampleTime = 1. / (double)sampleRate;
        double samplesInStep = (double)fftFrameSize / (double)osamp;
        double stepTime = sampleTime * samplesInStep;
        double deltaTime = stepTime;        

        // Definition of frequency is rate of change of phase, i.e. f = dϕ/dt
        // double deltaPhaseUnit = deltaPhase / M_TWOPI; // range [-.5, .5)
        double freqAdjust = (1. / M_TWOPI) * deltaPhase / deltaTime; 

        // Actual freq <-- WHY ???
        bins[k].freq = bins[k].idealFreq + freqAdjust;
    }
}

I just can't see it clearly, even though it seems to be staring in the face. Could someone please explain this process from scratch, step by step?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

や莫失莫忘 2024-10-17 15:55:43

基本原理非常简单。如果给定分量与 bin 频率完全匹配,则其相位不会从一个 FT 变化到下一个 FT。然而,如果频率与 bin 频率不完全对应,则连续 FT 之间将会出现相位变化。频率增量就是:

delta_freq = delta_phase / delta_time

并且组件频率的精确估计将是:

freq_est = bin_freq + delta_freq

The basic principle is very simple. If a given component exactly matches a bin frequency then its phase will not change from one FT to the next. However if the frequency does not correspond exactly with the bin frequency then there will be a phase change between successive FTs. The frequency delta is just:

delta_freq = delta_phase / delta_time

and the refined estimate of the frequency of the component will then be:

freq_est = bin_freq + delta_freq
世界和平 2024-10-17 15:55:43

我自己为 Performous 实现了这个算法。当您在某个时间偏移处进行另一个 FFT 时,您预计相位会根据偏移而变化,即,对于信号中存在的所有频率,相隔 256 个样本进行的两个 FFT 应该具有 256 个样本的相位差(这假设信号本身是稳定的,对于像 256 个样本这样的短期来说,这是一个很好的假设)。

现在,从 FFT 获得的实际相位值不是样本,而是相位角,因此它们会根据频率而有所不同。在下面的代码中,phaseStep 值是每个 bin 所需的转换因子,即对于对应于 bin x 的频率,相移将为 x *phaseStep。对于箱中心频率,x 将为整数(箱编号),但对于实际检测到的频率,它可以是任何实数。

const double freqPerBin = SAMPLE_RATE / FFT_N;
const double phaseStep = 2.0 * M_PI * FFT_STEP / FFT_N;

校正的工作原理是假设频段中的信号具有频段中心频率,然后计算其预期相移。从实际偏移中减去该预期偏移,留下误差。取余数(模 2 pi)(-pi 到 pi 范围),并使用 bin 中心 + 校正来计算最终频率。

// process phase difference
double delta = phase - m_fftLastPhase[k];
m_fftLastPhase[k] = phase;
delta -= k * phaseStep;  // subtract expected phase difference
delta = remainder(delta, 2.0 * M_PI);  // map delta phase into +/- M_PI interval
delta /= phaseStep;  // calculate diff from bin center frequency
double freq = (k + delta) * freqPerBin;  // calculate the true frequency

请注意,许多相邻的 bin 通常最终会校正到相同的频率,因为无论哪种方式,增量校正都可以高达 0.5 * FFT_N / FFT_STEP bin,因此您使用的 FFT_STEP 越小,校正的距离就越远(但这会增加处理能力)需要以及由于不准确而导致的不精确)。

我希望这有帮助:)

I have implemented this algorithm for Performous myself. When you take another FFT at a time offset, you expect the phase to change according to the offset, i.e. two FFTs taken 256 samples apart should have a phase difference of 256 samples for all frequencies present in the signal (this assumes that the signals themselves are steady, which is a good assumption for short periods like 256 samples).

Now, the actual phase values you get from FFT are not in samples but in phase angle, so they will be different depending on the frequency. In the following code the phaseStep value is the conversion factor needed per bin, i.e. for frequency corresponding to bin x, the phase shift will be x * phaseStep. For bin center frequencies x would be an integer (the bin number) but for actual detected frequencies it may be any real number.

const double freqPerBin = SAMPLE_RATE / FFT_N;
const double phaseStep = 2.0 * M_PI * FFT_STEP / FFT_N;

The correction works by assuming that the signal in a bin has the bin center frequency and then calculating the expected phase shift for that. This expected shift is substracted from the actual shift, leaving the error. A remainder (modulo 2 pi) is taken (-pi to pi range) and the final frequency is calculated with bin center + correction.

// process phase difference
double delta = phase - m_fftLastPhase[k];
m_fftLastPhase[k] = phase;
delta -= k * phaseStep;  // subtract expected phase difference
delta = remainder(delta, 2.0 * M_PI);  // map delta phase into +/- M_PI interval
delta /= phaseStep;  // calculate diff from bin center frequency
double freq = (k + delta) * freqPerBin;  // calculate the true frequency

Notice that many adjacent bins often end up corrected to the same frequency because the delta correction can be up to 0.5 * FFT_N / FFT_STEP bins either way so the smaller FFT_STEP you use, the further away will corrections be possible (but this increases the processing power needed as well as imprecision due to inaccuracies).

I hope this helps :)

情感失落者 2024-10-17 15:55:43

我终于明白了这一点;我真的必须从头开始推导它。我知道会有一些简单的方法来推导它,我(通常)的错误是试图遵循其他人的逻辑而不是使用我自己的常识。

这个谜题需要两把钥匙来解锁。

...

for (int k = 0; k <= fftFrameSize/2; k++) 
{
    // compute magnitude and phase 
    bins[k].mag = 2.*sqrt(fftBins[k].real*fftBins[k].real + fftBins[k].imag*fftBins[k].imag);
    bins[k].phase = atan2(fftBins[k].imag, fftBins[k].real);

    // Compute phase difference Δϕ fo bin[k]
    double deltaPhase;
    {
        double measuredPhaseDiff = bins[k].phase - gLastPhase[k];
        gLastPhase[k] = bins[k].phase;

        // Subtract expected phase difference <-- FIRST KEY
        // Think of a single wave in a 1024 float frame, with osamp = 4
        //   if the first sample catches it at phase = 0, the next will 
        //   catch it at pi/2 ie 1/4 * 2pi
        double binPhaseExpectedDiscrepancy = M_TWOPI * (double)k / (double)osamp;
        deltaPhase = measuredPhaseDiff - binPhaseExpectedDiscrepancy;

        // Wrap delta phase into [-Pi, Pi) interval 
        deltaPhase -= M_TWOPI * floor(deltaPhase / M_TWOPI + .5);
    }

    // say sampleRate = 40K samps/sec, fftFrameSize = 1024 samps in FFT giving bin[0] thru bin[512]
    // then bin[1] holds one whole wave in the frame, ie 44 waves in 1s ie 44Hz ie sampleRate / fftFrameSize
    double bin0Freq = (double)sampleRate / (double)fftFrameSize;
    bins[k].idealFreq = (double)k * bin0Freq;

    // Consider Δϕ for bin[k] between hops.
    // write as 2π / m.
    // so after m hops, Δϕ = 2π, ie 1 extra cycle has occurred   <-- SECOND KEY
    double m = M_TWOPI / deltaPhase;

    // so, m hops should have bin[k].idealFreq * t_mHops cycles.  plus this extra 1.
    // 
    // bin[k].idealFreq * t_mHops + 1 cycles in t_mHops seconds 
    //   => bins[k].actualFreq = bin[k].idealFreq + 1 / t_mHops
    double tFrame = fftFrameSize / sampleRate;
    double tHop = tFrame / osamp;
    double t_mHops = m * tHop;

    bins[k].freq = bins[k].idealFreq + 1. / t_mHops;
}

Finally I have figured this out; really I had to derive it from scratch. I knew there would be some simple way to derive it, my (usual) mistake was to attempt to follow other people's logic rather than use my own common sense.

This puzzle takes two keys to unlock it.

...

for (int k = 0; k <= fftFrameSize/2; k++) 
{
    // compute magnitude and phase 
    bins[k].mag = 2.*sqrt(fftBins[k].real*fftBins[k].real + fftBins[k].imag*fftBins[k].imag);
    bins[k].phase = atan2(fftBins[k].imag, fftBins[k].real);

    // Compute phase difference Δϕ fo bin[k]
    double deltaPhase;
    {
        double measuredPhaseDiff = bins[k].phase - gLastPhase[k];
        gLastPhase[k] = bins[k].phase;

        // Subtract expected phase difference <-- FIRST KEY
        // Think of a single wave in a 1024 float frame, with osamp = 4
        //   if the first sample catches it at phase = 0, the next will 
        //   catch it at pi/2 ie 1/4 * 2pi
        double binPhaseExpectedDiscrepancy = M_TWOPI * (double)k / (double)osamp;
        deltaPhase = measuredPhaseDiff - binPhaseExpectedDiscrepancy;

        // Wrap delta phase into [-Pi, Pi) interval 
        deltaPhase -= M_TWOPI * floor(deltaPhase / M_TWOPI + .5);
    }

    // say sampleRate = 40K samps/sec, fftFrameSize = 1024 samps in FFT giving bin[0] thru bin[512]
    // then bin[1] holds one whole wave in the frame, ie 44 waves in 1s ie 44Hz ie sampleRate / fftFrameSize
    double bin0Freq = (double)sampleRate / (double)fftFrameSize;
    bins[k].idealFreq = (double)k * bin0Freq;

    // Consider Δϕ for bin[k] between hops.
    // write as 2π / m.
    // so after m hops, Δϕ = 2π, ie 1 extra cycle has occurred   <-- SECOND KEY
    double m = M_TWOPI / deltaPhase;

    // so, m hops should have bin[k].idealFreq * t_mHops cycles.  plus this extra 1.
    // 
    // bin[k].idealFreq * t_mHops + 1 cycles in t_mHops seconds 
    //   => bins[k].actualFreq = bin[k].idealFreq + 1 / t_mHops
    double tFrame = fftFrameSize / sampleRate;
    double tHop = tFrame / osamp;
    double t_mHops = m * tHop;

    bins[k].freq = bins[k].idealFreq + 1. / t_mHops;
}
孤檠 2024-10-17 15:55:43

这是相位声码器方法使用的频率估计技术。

如果及时查看(固定频率和固定幅度)正弦波上的单个点,相位将随时间提前,提前量与频率成正比。或者您也可以进行相反的操作:如果您测量正弦曲线的相位在任何单位时间内的变化量,您就可以计算该正弦曲线的频率。

相位声码器使用两个FFT参考两个FFT窗口来估计相位,并且两个FFT的偏移量是两个相位测量在时间上的距离。从那时起,您就得到了该 FFT 箱的频率估计(FFT 箱大致是一个滤波器,用于隔离正弦分量或适合该箱的其他足够窄带的信号)。

为了使该方法发挥作用,使用中的FFT bin附近的频谱必须相当稳定,例如频率不改变等。这是相位声码器所需要的假设。

This is the frequency estimation technique used by phase vocoder methods.

If you look at a single point on a (fixed frequency and fixed amplitude) sine wave in time, the phase will advance with time by an amount proportional to the frequency. Or you can do the converse: if you measure how much the phase of a sinusoid changes over any unit of time, you can calculate the frequency of that sinusoid.

A phase vocoder uses two FFTs to estimate phase with reference to two FFT windows, and the offset of the two FFTs is the distance between the 2 phase measurements in time. From thence, you have your frequency estimate for that FFT bin (an FFT bin being roughly a filter to isolate a sinusoidal component or other sufficiently narrowband signal that fits within that bin).

For this method to work, the spectrum near the FFT bin in use has to be fairly stationary, e.g. not changing in frequency, etc. That's the assumption a phase vocoder requires.

薄情伤 2024-10-17 15:55:43

也许这会有所帮助。将 FFT bin 视为指定的小时钟或转子,每个时钟或转子都以 bin 的频率旋转。对于稳定的信号,可以使用您未获得的位中的数学来预测转子的(理论上)下一个位置。
针对这个“应该”(理想)位置,您可以计算几个有用的东西:(1)与相邻帧的 bin 中的相位的差异,相位声码器使用它来更好地bin 频率的估计,或者 (2) 更一般的相位偏差,它是音频中音符开始或某些其他事件的积极指标。

Maybe this will help. Think of the FFT bins as specifying little clocks or rotors, each spinning at the frequency of the bin. For a stable signal, the (theoretical) next position of the rotor can be predicted using the math in the bit you don't get.
Against this "should be" (ideal) position, you can compute several useful things: (1) the difference with the phase in a bin of an adjacent frame, which is used by a phase vocoder to better estimate of the bin frequency, or (2) more generally phase deviation, which is a positive indicator of a note onset or some other event in the audio.

森林散布 2024-10-17 15:55:43

恰好落在 bin 频率上的信号频率将 bin 相位提前 2π 的整数倍。由于 FFT 的周期性,与 bin 频率相对应的 bin 相位是 2π 的倍数,因此在这种情况下没有相位变化。你提到的文章也解释了这一点。

Signal frequencies that fall exactly on a bin frequency advance bin phase by integer multiples of 2π. Since the bin phases that correspond to the bin frequencies are multiples of 2π due to the periodic nature of the FFT there is no phase change in this case. The article you mention also explains this.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文