使用computeSpectrum()返回FFT值进行基音检测

发布于 2024-11-14 16:05:17 字数 443 浏览 3 评论 0 原文

  • 我正在使用 Actionscript 3.0 for Flash Player 10.3 进行开发
  • 我正在加载的 .mp3 上使用 computeSpectrum()
  • 运行 *Event.ENTER_FRAME* 以获取 byteArray 中每个样本的快照
  • ByteArray 包含 512 个值(每个通道 256 个)。这些值是 FFT 频谱,范围从 0 到 1。
  • 我无法使用每个样本的峰值频率(正如我发现的!),因为最高值不一定是基频! 结果我得到了很多随机值! 当然我也得到了一些正确的结果,但这还不够!

我发现了自相关...
有人可以给我一个关于如何使用它的例子吗?

或者来自其他脚本语言的链接或示例脚本来掌握它?

问候
初始化代码

  • I'm developing using Actionscript 3.0 for Flash Player 10.3
  • I'm using computeSpectrum() on a loaded .mp3
  • Running *Event.ENTER_FRAME* to get snapshots of each sample in an byteArray
  • The ByteArray contains 512 values (256 for each channel). These values are FFT Spectrum, ranging from 0 to 1.
  • I can't use the peak frequency for each of the samples (as I found found out!) because the highest value is not necessarily the fundamental frequency!
    As a result I'm getting lots of random values all over the place!
    Of course I'm getting some correct too, but that's not enough!

I found out about auto-correlation...
Can someone give me an example on how I could use it?

Or links, or example scripts even from other scripting languages to get a grip on it?

Regards
initcode

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

银河中√捞星星 2024-11-21 16:05:17

听起来您似乎已经了解如何获取 FFT 频谱,对吧?

spectrum http: //flic.kr/p/7notw6

但是,如果您正在寻找基本面(绿点),则不能只使用最高峰值。这不一定是根本。在我的示例中,实际基频为 100 Hz,但最高峰为 300 Hz。

有很多不同的方法可以找到真正的基础,并且每种方法在不同的情况下效果更好。 comp.dsp 上的一个线程提到“FFT、倒谱、自动/互相关、AMDF/ASDF”。

举一个简单的例子,每个红点与其相邻的红点相距 100 Hz,因此,如果您使用峰值查找算法,然后将每个谐波与下一个谐波之间的距离平均在一起,您就会找到基波,但这会如果遗漏任何峰值,或包含额外峰值,或者信号对称且仅包含奇次谐波(1f、3f、5f),则失败。您需要找到众数,然后丢弃异常值,然后求平均值。这可能是一种容易出错的方法。

您还可以对原始波形进行自相关。从概念上讲,这意味着将波形的副本滑过自身,并找到其与自身最佳对齐的延迟(这将是一个完整的周期)。在正常实现中,我们使用 FFT 来加快速度。自相关基本上是

  • IFFT(FFT(signal)⋅FFT(signal)*),

其中 * 表示复共轭或时间反转。 例如,在 Python 中

correlation = fftconvolve(sig, sig[::-1], mode='full')

fftconvolve() 的源代码相对简单:https://github.com/scipy/scipy/blob/master/scipy/signal/signaltools.py#L133

Sounds like you already understand how to get an FFT spectrum, right?

spectrum http://flic.kr/p/7notw6

But if you're looking for the fundamental (green dot), you can't just use the highest peak. It's not necessarily the fundamental. In my example, the actual fundamental is 100 Hz, but the highest peak is 300 Hz.

There are a lot of different ways you could find the true fundamental, and each works better in different contexts. One thread on comp.dsp mentions "FFT, cepstrum, auto/cross-correlation, AMDF/ASDF".

For a simple example, each of the red dots is 100 Hz away from its neighbor, so if you used a peak-finding algorithm and then averaged together the distance between each harmonic and the next, you'd find the fundamental, but this would fail if any of the peaks were missed, or extra peaks included, or if the signal was symmetrical and only contained odd harmonics (1f, 3f, 5f). You'd need to find the mode and then throw away outliers and then average. This is probably an error-prone method.

You could also do an autocorrelation of the original waveform. Conceptually, this means sliding a copy of the waveform past itself, and finding the delay at which it best lines up with itself (which will be one complete cycle). In normal implementation, we use the FFT, though, to speed it up. Autocorrelation is basically

  • IFFT(FFT(signal)⋅FFT(signal)*)

where * means complex conjugate, or time reversal. In Python, for instance:

correlation = fftconvolve(sig, sig[::-1], mode='full')

and the source for fftconvolve() is relatively simple: https://github.com/scipy/scipy/blob/master/scipy/signal/signaltools.py#L133

笔芯 2024-11-21 16:05:17

您可以使用谐波积谱方法来估计频谱(FFT 结果)中泛音峰值之间的距离(频率差),即使某些峰值丢失,只要没有太多杂散频率峰值(噪声)即可。

要制作谐波乘积频谱,请在半透明纸上打印 FFT 并将其卷成圆柱体(或在软件中执行等效操作)。将圆柱体包裹得越来越紧,直到最大数量的峰重叠。周长可以很好地估计音高。这适用于任何具有大量谐波的音乐声音,即使基本音调频率峰值丢失或较弱。

You can use the Harmonic Product Spectrum method to estimate the distance (frequency difference) between overtone peaks in a frequency spectrum (FFT results), even if some peaks are missing, as long as there are not too many spurious frequency peaks (noise).

To do a Harmonic Product Spectrum, print the FFT out on semi-transparent paper and roll it up into a cylinder (or do the equivalent in software). Wrap the cylinder tighter and tighter until the greatest amount of peaks overlap. The circumference will be a good estimate of the pitch. This works for any musical sounds that have lots of harmonics, even if a fundamental pitch frequency peak is missing or weak.

留蓝 2024-11-21 16:05:17

你想做什么?

我以前没有使用过computeSpectrum(),但我职业生涯的前半段是作为DSP工程师。

如果它符合文档所述,那么您不需要对结果进行自相关。

在字节数组中,索引表示频率仓,索引值表示该特定频率的大小。

如果通过基音检测,您的意思是找到最强的频率,那么您需要循环遍历字节数组并计算每个索引的 sqrt(left*left+right*right) 。找出这些的最大值。最大值的索引代表最强的频率。

假设fs=44.1kHz,i是你的索引,那么最强频率是

f = (i/255) * (44100 / 2);

请记住,频率分辨率受到箱间距的限制。如果需要更高分辨率,则需要对数据进行插值。

What are you trying to do?

I haven't used computeSpectrum() before, but the first half of my career as a DSP engineer.

If it does what the docs say, then you don't need to autocorrelate the results.

In your byte array, the index represents the frequency bin, and the index value represents the magnitude of that particular frequency.

If by pitch detection, you mean find the strongest frequency, then you need to loop through the byte array and calculate the sqrt(left*left+right*right) for each index. Find the max value of these. The index of the max value represents thr strongest frequency.

Assuming fs=44.1kHz, and i is your index, then the strongest frequency is

f = (i/255) * (44100 / 2);

Keep in mind that you are limited by the bin spacing for frequency resolution. If you need higher resolution, you need to interpolate the data.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文