当前位置：文江博客话题详情

用于检测音高的共振算法

发布于 2024-09-30 02:12:29 字数 184 浏览 5 评论 0原文

我一直在研究检测麦克风中唱出的音调的不同方法。

由于我想找出它与特定音级的共鸣程度，我想知道我是否可以做某种基于物理的共鸣算法。

如果您按住钢琴上的延音踏板，并向其唱出一个音调（如果您足够接近其现有音高之一），一个音符就会产生共鸣。

我希望能够模拟这种行为。但我该如何完成这个任务呢？谁能帮助我推动这一切？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

╭⌒浅淡时光〆 2024-10-07 02:12:29

看一下自相关函数。

回复收藏 0 原文

铁轨上的流浪者 2024-10-07 02:12:29

我发现的一个有趣的解决方案是将麦克风输入简单地输入到 Karplus Strong 算法中。

因此，Karplus Strong 通过以下方式模拟弹拨弦：

创建一个循环缓冲区（如果我们以 44.1 kHz 采样，并且希望模拟中间 A，即 440Hz 的 A4，那么我们的缓冲区大小将约为 101 个元素）
充满静态缓冲区在 -1 和 1 之间
走一圈，每次将当前值设置为前两次的平均值（并向扬声器发出当前值），
可以添加一个阻尼常数

现在如果我们将麦克风流添加到此过程中，那么：

x = ( ringBuf[prev] + ring theBuf[prev2] ) * 0.5 * 0.998;
micOut[moPtr++] = x;
ringBuf[curr] = x + micIn[miPtr++];

它实际上非常准确地模拟了用吉他唱歌。如果你把音调调准，它真的会哭。

但这种方法存在一个严重的问题：考虑由 100 个元素的缓冲区生成的音高，以及由 101 个元素的缓冲区生成的音高。无法在这两个值之间生成任何音高。我们仅限于一组离散的工作范围。虽然这对于低音来说非常准确（A2 的缓冲区长度约为 400），但我们走得越高，错误就越大：A7 的缓冲区长度约为 12.5。该错误可能超过半音。

我看不到任何解决这个问题的方法。我认为必须放弃这种方法。

One interesting solution I found is simply feeding the microphone input into a Karplus Strong algorithm.

So Karplus Strong simulates a plucked string by:

creating a circular buffer (if we are sampling at 44.1 kHz and we wish to simulate the middle A ie A4 which is 440Hz, then our buffer size would be ~101 elements)
filling it full of static between -1 and 1
walking the circle, each time setting the current value to the average of the previous two, ( and emitting current value to the speaker )
a dampening constant can be added

Now if we add the microphone stream into this process, so:

x = ( ringBuf[prev] + ring theBuf[prev2] ) * 0.5 * 0.998;
micOut[moPtr++] = x;
ringBuf[curr] = x + micIn[miPtr++];

It actually simulates singing into a guitar really accurately. if you get your tone spot on, it truly wails.

But there is a severe problem with this approach: consider the pitch generated by a buffer of 100 elements, and that generated by a buffer of 101 elements. there is no way to generate any pitch in between these two values. we are limited to a discrete working set Of pitches. while this is going to be pretty accurate for low notes (A2 would have a buffer length of ~400), the higher we go the more the error: A7 would have a buffer length of ~12.5. That error is probably over a semitone.

I cannot see any way of countering this problem. I think the approach has to be dropped.

回复收藏 0 原文

不气馁 2024-10-07 02:12:29

完全基于离散傅里叶变换 (DFT) 的算法有许多缺点。
一个问题是时间分辨率，因为 DFT 适用于窗口内的样本，因此您无法确定该窗口内的音调变化。
另一个问题是 DFT 的离散对数频率分辨率，这对于音调检测器来说可能不够好。毕竟，DFT 只能找到窗口大小整数波长的波。

稍微先进的算法可以做这样的事情：

粗略地检测基音频率（可以使用 DFT 来完成）。
用于过滤隔离基音频率的带通信号。
计算滤波信号中两个峰值之间的样本数。

通过计算样本数量，您可以获得与样本频率相匹配的音高分辨率。
如果您想要比采样频率更高的分辨率，您可以将函数（例如多项式）拟合到峰值点周围的样本。由于您已经抑制了其他频率，因此您应该能够做到这一点。

正如另一个答案所暗示的那样，您还可以使用自相关来查找信号内的最大信号重复。然而我应该说，实现一个好的自相关基音检测器并不是一件容易的事。在不知情的情况下，我会假设吉他调音器和类似的廉价电子产品将其算法基于带滤波器并结合计算峰值之间的样本距离。

回复收藏 0 原文

日暮斜阳 2024-10-07 02:12:29

您可以使用阻尼谐波振荡器，以输入作为驱动力。选择振荡器的参数，使其谐振频率与您想要的频率相匹配。

您会在大多数有关力学的理论物理书籍中找到对阻尼谐振子的分析。

回复收藏 0 原文

蓝天 2024-10-07 02:12:29

我发现一种有用的方法是生成两个相距 90 度的参考波（我称之为“正弦”和“余弦”），并在相当短的时间内（例如 1）取输入波形与这些参考波的点积。 /60 秒）输入的延伸。这将为您提供一个有点嘈杂的指标，表明您的输入频率与参考波同相或异相的程度（使用两个参考波生成的值的平方和的平方根将为振幅）。使用较小的窗口大小时，您会注意到输出相当嘈杂，但如果您使用简单的 FIR 或 IIR 滤波器之类的东西对输出进行滤波，您可能会得到相当合理的结果。

一个不错的技巧是生成两个振幅数：对于第一个振幅数，通过两轮滤波运行正弦和余弦振幅，然后计算平方和。对于第二个，通过一轮滤波运行幅度，然后计算平方和，然后通过另一轮滤波运行。

两种幅度测量都会经历相同的延迟，但第一个测量将比第二个更具选择性；因此，您可以非常清楚地判断频率是“正确”还是有点偏离。使用这种方法，可以快速检测 DTMF 音调，同时拒绝甚至几赫兹的音调（偏离音调的音调在“松散”检测器上比紧密检测器更强烈地拾取）。

示例代码：

double sine_phase,sine_freq;
void process_some_waves(double *input, int16 len, 
  double *sine_phase, double sine_freq, 
  double *sine_result, double *cosine_result)
{
  int i;
  double phase, sin_tot,cos_tot;
  phase = *sine_phase;
  sin_tot = cos_tot = 0;
  for (i=0; len > i; i++)
  {
    sin_tot += input[i] * sin(phase);
    cos_tot += input[i] * cos(phase);
    phase += sine_freq;
  }
  *sine_result = sin_tot;
  *cosine_result = cos_tot;
  *sine_phase = phase;
}

/* Takes first element in buffer and 'smears' it through buffer with simple Gaussian resp. */
void simple_fir_filter(double *buff, int buffsize)
{
  int i;

  for (i=buffsize-1; i>=2; i--)
    buff[i] = (buff[i-1] + buff[i-2])/2;
}

#define FILTER_SIZE1 10
#define FILTER_SIZE2 8
#define SECTION_LENGTH 128
#define FREQ whatever

double sine_buff1[FILTER_SIZE1], sine_buff2[FILTER_SIZE2];
double cos_buff1[FILTER_SIZE1], cos_buff2[FILTER_SIZE2];
double combined_buff[FILTER_SIZE2];
double tight_amplitude, loose_amplitude;
double ref_phase;

void handle_some_data(double *input)
{
  /* Put results in first element of filter buffers */
  process_some_waves(input, SECTION_LENGTH, &ref_phase, FREQ, sine_buff1, cos_buff1);

  /* Run first stage of filtering */

  simple_fir_filter(sine_buff1, FILTER_SIZE1); 
  simple_fir_filter(cosine_buff1, FILTER_SIZE1);

  /* Last element of each array will hold results of filtering. */
  /* Now do second stage */

  sine_buff2[0] = sine_buff1[FILTER_SIZE1-1];
  cosine_buff2[0] = cosine_buff1[FILTER_SIZE1-1];
  combined_buff[0] = sine_buff2[0]*sine_buff2[0] + cosine_buff2[0]*cosine_buff2[0];
  simple_fir_filter(sine_buff2, FILTER_SIZE2); 
  simple_fir_filter(cosine_buff2, FILTER_SIZE2); 
  simple_fir_filter(combined_buff, FILTER_SIZE2); 

  tight_amplitude = sine_buff2[FILTER_SIZE2-1]*sine_buff2[FILTER_SIZE2-1] + 
                    cosine_buff2[FILTER_SIZE2-1]*cosine_buff2[FILTER_SIZE2-1];
  loose_amplitude = combined_buff2[FILTER_SIZE2-1];
}

此处的代码对除数组下标之外的所有数学运算使用“double”。在实践中，用整数数学代替一些数学几乎肯定会更快。在具有浮点的机器上，我希望最好的方法是将相位保持为 32 位整数并使用约 4096 个“单个”正弦值的表（RAM 中的表大小越小，缓存一致性越好）表现）。我在定点（整数）DSP 上使用了与上述非常相似的代码，并取得了巨大成功； process_some_waves 中的正弦和余弦计算是在单独的“循环”中完成的，每个“循环”都被实现为带有“重复”前缀的单个指令。

One approach I've found to be helpful is to generate two reference waves 90 degrees apart (I call them "sine" and "cosine") and take the dot product of the input waveform with those reference waves over some fairly short (say 1/60 second) stretches of the input. That will give you a somewhat noisy indicator of how much of the input frequency you have that's in phase or out of phase with regard to your reference waves (the square root of the sum of the squares of the values generated using the two reference waves will be the amplitude). With a small window size, you'll notice that the output is rather noisy, but if you filter the output with something like a simple FIR or IIR filter you should probably get something pretty reasonable.

One nice trick is to generate two amplitude numbers: for the first one, run the sine and cosine amplitudes through two rounds of filtering, then compute the sum of the squares. For the second, run the amplitudes through one round of filtering, then compute the sum of the squares, and then run that through another round of filtering.

Both amplitude measurements will experience the same delay, but the first one will be much more selective than the second; you can thus tell very clearly whether a frequency is 'right on' or is a bit off. Using this approach, it's possible to detect DTMF tones quickly while rejecting tones that are even a few Hz off (off-pitch tones will pick up much more strongly on the 'loose' detector than the tight one).

Sample code:

double sine_phase,sine_freq;
void process_some_waves(double *input, int16 len, 
  double *sine_phase, double sine_freq, 
  double *sine_result, double *cosine_result)
{
  int i;
  double phase, sin_tot,cos_tot;
  phase = *sine_phase;
  sin_tot = cos_tot = 0;
  for (i=0; len > i; i++)
  {
    sin_tot += input[i] * sin(phase);
    cos_tot += input[i] * cos(phase);
    phase += sine_freq;
  }
  *sine_result = sin_tot;
  *cosine_result = cos_tot;
  *sine_phase = phase;
}

/* Takes first element in buffer and 'smears' it through buffer with simple Gaussian resp. */
void simple_fir_filter(double *buff, int buffsize)
{
  int i;

  for (i=buffsize-1; i>=2; i--)
    buff[i] = (buff[i-1] + buff[i-2])/2;
}

#define FILTER_SIZE1 10
#define FILTER_SIZE2 8
#define SECTION_LENGTH 128
#define FREQ whatever

double sine_buff1[FILTER_SIZE1], sine_buff2[FILTER_SIZE2];
double cos_buff1[FILTER_SIZE1], cos_buff2[FILTER_SIZE2];
double combined_buff[FILTER_SIZE2];
double tight_amplitude, loose_amplitude;
double ref_phase;

void handle_some_data(double *input)
{
  /* Put results in first element of filter buffers */
  process_some_waves(input, SECTION_LENGTH, &ref_phase, FREQ, sine_buff1, cos_buff1);

  /* Run first stage of filtering */

  simple_fir_filter(sine_buff1, FILTER_SIZE1); 
  simple_fir_filter(cosine_buff1, FILTER_SIZE1);

  /* Last element of each array will hold results of filtering. */
  /* Now do second stage */

  sine_buff2[0] = sine_buff1[FILTER_SIZE1-1];
  cosine_buff2[0] = cosine_buff1[FILTER_SIZE1-1];
  combined_buff[0] = sine_buff2[0]*sine_buff2[0] + cosine_buff2[0]*cosine_buff2[0];
  simple_fir_filter(sine_buff2, FILTER_SIZE2); 
  simple_fir_filter(cosine_buff2, FILTER_SIZE2); 
  simple_fir_filter(combined_buff, FILTER_SIZE2); 

  tight_amplitude = sine_buff2[FILTER_SIZE2-1]*sine_buff2[FILTER_SIZE2-1] + 
                    cosine_buff2[FILTER_SIZE2-1]*cosine_buff2[FILTER_SIZE2-1];
  loose_amplitude = combined_buff2[FILTER_SIZE2-1];
}

The code here uses 'double' for all math other than array subscripting. In practice, it would almost certainly be faster to replace some of the math with integer maths. On machines with floating point, I would expect the best approach would be to keep the phase as a 32-bit integer and use a table of ~4096 'single' sine values (the smaller the table size in RAM, the better the cache coherency performance). I used code very much like the above on a fixed-point (integer) DSP with great success; the sine and cosine computations in process_some_waves were done in separate "loops", with each "loop" being realized as a single instruction with a "repeat" prefix.

回复收藏 0 原文