关于 FFT 和音调估计的几个问题

发布于 2024-10-18 16:09:41 字数 364 浏览 6 评论 0原文

我需要对 FFT 和音调估计进行一些一般性的澄清。

1.) 我读到,FFT 的块大小越大,其精度就越高,尽管我知道这也有一个缺点。这是真的吗?因为我一直在试验,每当我使用 16384 的块大小而不是 8192 或 4096 时,我都会得到更糟糕的结果。有人可以向我解释一下吗?

2.) 最初,我认为从 FFT 获取音调只是获取最高强度的 bin 的简单问题。然而,在在这里发布并阅读了一些问题之后,我认为可能还有更多这样的问题。有人可以建议我如何从 FFT 中获得良好的音高估计吗?

3.)虽然我已经有了一个好主意,但有人可以简单地解释一下自相关器算法的作用吗? (我的想法是,它基本上是一种比较和对比算法,并且选择差异最小的算法)

非常感谢!

I have a couple of clarifications that I need with FFT and Pitch Estimation in general.

1.) I read that the larger your block size for your FFT, the better accuracy it will have, although I know that there is also a downside to this. Is this really true? Because Ive been experimenting and whenever I use a block size of 16384 as opposed to 8192 or 4096, I get worse results. Can someone clarify me about this?

2.) Initially, I believed that getting the pitch from the FFT is only a simple matter of getting the bin with the highest intensity. However, after posting and reading some questions here, I think that there may be more the this. Can someone suggest me on how to get a good pitch estimation from FFT?

3.) Although I already have a good idea, can someone just explain in simple terms what the auto-correlator algorithm does? (My idea is that its basically a compare and contrast algorithm and the one with the lowest difference is the chosen one)

Thanks a lot!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

夜司空 2024-10-25 16:09:41
  1. 缺点是处理时间、内存消耗和延迟。如果您想要实时显示,则在开始处理之前必须等待整个帧填满,这可能会花费令人无法接受的时间。
  2. 是的,还有更多。具体来说,阶段。它也可能是具有最大负值(180 度偏移)的容器,或为零(90 度偏移)的容器,或两者之间的任何值。您可能想使用复数进行转换,并寻找最大的绝对值。
  3. 该算法通过测试信号与其自身的时移版本的“相似度”来寻找信号中的周期性元素。输出是从时间偏移到“相似性”的映射;然后您可以寻找最高值。
  1. The downside is processing time, memory consumption and delay. If you want realtime display, having to wait for an entire frame to fill up before beginning processing may take inacceptably long.
  2. Yes, there is more. Specifically, phase. It could also be the bin with the largest negative value (180 degree shift), or one that is zero (90 degree shift), or anything in between. You probably want to do the conversion using complex numbers, and look for the largest absolute value.
  3. The algorithm looks for periodic elements in the signal by testing how "similar" a signal is to time-shifted versions of itself. The output is an mapping from time offset to "similarity"; you can then look for the highest value.
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文