当前位置：文江博客话题详情

检测基频

发布于 2024-07-12 07:47:42 字数 1435 浏览 13 评论 0原文

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

旧人九事 2024-07-19 07:47:42

这正是我去年在这里做的项目:)除了我的项目是关于跟踪人类歌声的音高（而且我没有机器人来演奏曲调）的

最快方法我想到的是利用 BASS 库。它包含即用型功能，可以为您提供来自默认记录设备的 FFT 数据。看一下 BASS 附带的“livespec”代码示例。

顺便说一句，原始 FFT 数据不足以确定基频。您需要谐波乘积谱等算法来获取F0。

另一个考虑因素是音频源。如果您打算进行 FFT 并对其应用谐波乘积频谱。您需要确保输入只有一个音频源。如果它包含多个来源（例如现代歌曲），则需要考虑很多频率。

调和积谱理论
如果输入信号是音符，
那么它的频谱应该包括
系列峰，对应于
基频与谐波
的整数倍分量
基频。因此当我们
将频谱压缩若干
次（下采样），并进行比较
通过原始光谱，我们可以看到
最强的谐波峰值线
向上。原著中的第一个巅峰
频谱与第二个一致
频谱中的峰值被压缩
二的因数，与
光谱中的第三个峰
压缩了三倍。
因此，当不同的频谱
相乘，结果将
在基本面形成清晰的峰值
频率。
方法
首先，我们将输入信号分为
通过应用汉宁窗进行分段，
其中窗口大小和跳跃大小是
作为输入给出。对于每个窗口，
我们利用短时傅里叶
Transform 转换输入信号
从时域到频域
领域。一旦输入进入
频域，我们应用
谐波积谱技术
每个窗口。
HPS 涉及两个步骤：
下采样和乘法。到
下采样，我们压缩了频谱
通过重新采样在每个窗口中两次：
第一次，我们压缩
原始频谱由两个和
第二次，三点。一旦这是
完成后，我们将三相乘
谱一起并找到
对应于峰值的频率
（最大值）。这个特别的
频率代表基波
该特定窗口的频率。
HPS 方法的局限性
此方法的一些不错的功能
包括：它是计算上的
便宜，相当耐
加性和乘性噪声，以及
可调整为不同种类
输入。例如，我们可以改变
压缩光谱的数量
使用，我们可以替换光谱
与谱相乘
添加。然而，由于人类的音调
感知基本上是对数的，
这意味着低音可能是
跟踪准确度低于高
球场。
HPS 的另一个严重不足
方法是它的分辨率是
仅与 FFT 的长度一样好
用于计算频谱。要是我们
执行短而快速的 FFT，我们是
离散的数量有限
我们可以考虑的频率。为了
以获得更高的分辨率
输出（因此看到较少
我们的音高输出中的颗粒度），我们
需要进行更长的FFT
需要更多时间。

来自：http://cnx.org/content/m11714/latest/

This is exactly what I'm doing here as my last year project :) except one thing that my project is about tracking the pitch of human singing voice (and I don't have the robot to play the tune)

The quickest way I can think of is to utilize BASS library. It contains ready-to-use function that can give you FFT data from default recording device. Take a look at "livespec" code example that comes with BASS.

By the way, raw FFT data will not enough to determine fundamental frequency. You need algorithm such as Harmonic Product Spectrum to get the F0.

Another consideration is the audio source. If you are going to do FFT and apply Harmonic Product Spectrum on it. You will need to make sure the input has only one audio source. If it contains multiple sources such as in modern songs there will be to many frequencies to consider.

Harmonic Product Spectrum Theory
If the input signal is a musical note,
then its spectrum should consist of a
series of peaks, corresponding to
fundamental frequency with harmonic
components at integer multiples of the
fundamental frequency. Hence when we
compress the spectrum a number of
times (downsampling), and compare it
with the original spectrum, we can see
that the strongest harmonic peaks line
up. The first peak in the original
spectrum coincides with the second
peak in the spectrum compressed by a
factor of two, which coincides with
the third peak in the spectrum
compressed by a factor of three.
Hence, when the various spectrums are
multiplied together, the result will
form clear peak at the fundamental
frequency.
Method
First, we divide the input signal into
segments by applying a Hanning window,
where the window size and hop size are
given as an input. For each window,
we utilize the Short-Time Fourier
Transform to convert the input signal
from the time domain to the frequency
domain. Once the input is in the
frequency domain, we apply the
Harmonic Product Spectrum technique to
each window.
The HPS involves two steps:
downsampling and multiplication. To
downsample, we compressed the spectrum
twice in each window by resampling:
the first time, we compress the
original spectrum by two and the
second time, by three. Once this is
completed, we multiply the three
spectra together and find the
frequency that corresponds to the peak
(maximum value). This particular
frequency represents the fundamental
frequency of that particular window.
Limitations of the HPS method
Some nice features of this method
include: it is computationally
inexpensive, reasonably resistant to
additive and multiplicative noise, and
adjustable to different kind of
inputs. For instance, we could change
the number of compressed spectra to
use, and we could replace the spectral
multiplication with a spectral
addition. However, since human pitch
perception is basically logarithmic,
this means that low pitches may be
tracked less accurately than high
pitches.
Another severe shortfall of the HPS
method is that it its resolution is
only as good as the length of the FFT
used to calculate the spectrum. If we
perform a short and fast FFT, we are
limited in the number of discrete
frequencies we can consider. In order
to gain a higher resolution in our
output (and therefore see less
graininess in our pitch output), we
need to take a longer FFT which
requires more time.

from: http://cnx.org/content/m11714/latest/

回复收藏 0 原文

凉月流沐 2024-07-19 07:47:42

只是评论：（谐波）声音中基波谐波可能会丢失，这不会改变感知的音高。作为极限情况，如果您采用方波（例如 C# 音符）并完全抑制第一个谐波，感知到的音符仍然是 C#，在同一八度。在某种程度上，当我们的大脑猜测一个音符时，它能够补偿某些和声的缺失，甚至是第一个和声。
因此，要使用频域技术检测音高，您应该考虑所有谐波（傅里叶变换幅度的局部最大值），并提取它们的某种“最大公约数”频率。音高检测根本不是一个小问题...

DAFX 大约有 30 页致力于音高检测，带有示例和 Matlab 代码。

回复收藏 0 原文

春庭雪 2024-07-19 07:47:42

零交叉 - < a href="http://en.wikipedia.org/wiki/Zero_crossing" rel="nofollow noreferrer">http://en.wikipedia.org/wiki/Zero_crossing （这种方法用于廉价吉他调音器）

回复收藏 0 原文

葬シ愛 2024-07-19 07:47:42

尝试 YAAPT 音调跟踪，它可以检测时域和频域中的基频。您可以从链接下载 Matlab 源代码，并使用频谱处理部分查找 FFT 输出中的峰值。

Python包 http://bjbschmitt.github.io/AMFM_decompy/pYAAPT.html#

回复收藏 0 原文

南街女流氓 2024-07-19 07:47:42

您是否尝试过维基百科关于音调检测的文章？它包含一些您可能感兴趣的参考资料。

此外，这里还有一个 DSP 应用程序和库列表，您可以在其中浏览。该列表只提到了 Linux 软件包，但其中很多都是跨平台的，并且有很多源代码可以查看。

仅供参考，大多数精通 DSP 的人都可以检测单声道录音中音符的音调。检测所有音符（包括和弦等）的音高要困难得多。

回复收藏 0 原文

坐在坟头思考人生 2024-07-19 07:47:42

只是一个想法 - 但您是否需要处理数字音频流作为输入？

如果没有，请考虑使用音乐的符号表示（例如 MIDI）。然后，音符的音高将被明确说明，您可以非常轻松地合成与音高、节奏和许多其他音乐参数相对应的声音（和动作）。

如果您需要分析数字音频流（mp3、wav、实时输入等），请记住，虽然简单单声道声音的音高检测非常先进，但复音音高检测是一个尚未解决的问题。在这种情况下，您可能会发现我对这个问题的回答很有帮助。

回复收藏 0 原文

冰火雁神 2024-07-19 07:47:42

要从复调音乐中提取旋律的基频，您可以尝试 MELODIA 插件：http://mtg .upf.edu/technologies/melodia

提取歌曲中所有乐器的 F0（多 F0 跟踪）或将它们转录成音符是一项更加困难的任务。旋律提取和音乐转录仍然是开放的研究问题，因此无论您使用什么算法/工具，都不要期望获得完美的结果。

回复收藏 0 原文

旧故 2024-07-19 07:47:42

如果您正在尝试检测复调录音的音符（同时有多个音符），祝您好运。这是一个非常棘手的问题。我不知道有什么方法可以听弦乐四重奏的录音，并使用算法来分离四个声音。（也许是小波？）如果一次只是一个音符，那么有几种音高跟踪算法，其中许多在其他评论中提到过。

您要使用的算法取决于您正在听的音乐类型。如果你想让它捕捉到人们唱歌的声音，有很多专门为语音设计的好算法。（这就是大多数研究的所在。）如果您想选择特定的乐器，您必须更有创意。语音算法可以很简单，因为人类歌声的范围通常限制在 100-2000 Hz 左右。（说话范围要窄得多）。然而，钢琴的基频约为 27 Hz。到 4200 Hz，因此您正在处理通常被音高检测算法忽略的更广泛的范围。

大多数仪器的波形相当复杂，含有大量谐波，因此计算零或仅采用自相关等简单方法是行不通的。如果您大致知道要查找的频率范围，则可以进行低通滤波，然后将计数归零。我认为你最好使用更复杂的算法，例如其他用户提到的谐波乘积谱，或 YAAPT（“音调跟踪的另一种算法”）或类似的算法。

最后一个问题：有些乐器，尤其是钢琴，会存在基础缺失、不和谐的问题。缺失的基本原理可以通过音高跟踪算法来处理……事实上，它们必须如此，因为电子传输中经常会删除基本原理……尽管您可能仍然会遇到一些八度音阶误差。然而，如果有人在钢琴的底部八度音阶中演奏音符，不和谐会给您带来问题。正常的音调跟踪算法并不是为了处理不和谐而设计的，因为人声并不是明显不和谐。

If you're trying to detect the notes of a polyphonic recording (multiple notes at the same time) good luck. That's a very tricky problem. I don't know of any way to listen to, say, a recording of a string quartet and have an algorithm separate the four voices. (Wavelets maybe?) If it's just one note at a time, there are several pitch tracking algorithms out there, many of them mentioned in other comments.

The algorithm you want to use will depend on the type of music you are listening to. If you want it to pick up people singing there are a lot of good algorithms out there designed specifically for voice. (That's where most of the research is.) If you are trying to pick up specific instruments you'll have to be a bit more creative. Voice algorithms can be simple because the range of the human singing voice is generally limited to about 100-2000 Hz. (Speaking range is much more narrow). The fundamental frequencies on a piano, however, go from about 27 Hz. to 4200 Hz., so you're dealing with a wider range usually ignored by voice pitch detection algorithms.

The waveform of most instruments is going to be fairly complex, with lots of harmonics, so a simple approach like counting zeros or just taking the autocorrelation won't work. If you knew roughly what frequency range you were looking in you could low-pass filter and then zero count. I'd think you'd be better off though with a more complex algorithm such as the Harmonic Product Spectrum mentioned by another user, or YAAPT ("Yet Another Algorithm for Pitch Tracking"), or something similar.

One last problem: some instruments, the piano in particular, will have the problem of missing fundamentals and inharmonicity. Missing fundamentals can be dealt with by the pitch tracking algorithms...in fact they have to be since fundamentals are often cut out in electronic transmission...though you'll probably still get some octave errors. Inharmonicity however, will give you problems if somebody plays a note in the bottom octaves of the piano. Normal pitch tracking algorithms aren't designed to deal with inharmonicity because the human voice is not significantly inharmonic.

回复收藏 0 原文

浅听莫相离 2024-07-19 07:47:42

您基本上需要一个频谱分析仪。您也许能够对模拟输入的录音进行 FFT，但这在很大程度上取决于录音的分辨率。

回复收藏 0 原文

故事还在继续 2024-07-19 07:47:42

我立即想到的是：

过滤掉非常低的频率（鼓、低音线），
过滤掉高频（谐波）
FFT，
在 FFT 输出中寻找旋律的峰值

我不确定，这是否适用于非常复调的音乐声音 - 也许可以在谷歌上搜索“FFT、分析、旋律等”。将返回有关可能问题的更多信息。

问候

回复收藏 0 原文

~没有更多了~

关于作者

宣告ˉ结束

暂无简介

文章

28 人气

关注发私信

友情链接

文江博客

检测基频

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（10）

关于作者

相关话题

热门标签

推荐作者

882123719

朦胧时间

alipaysp_DQOPIT9H5Y

眼藏柔

微信用户

寻梦旅人

友情链接

检测基频

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（10）

关于作者

相关话题

热门标签

推荐作者

882123719

朦胧时间

alipaysp_DQOPIT9H5Y

眼藏柔

微信用户

寻梦旅人

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。