Python频率检测

发布于 2024-08-28 16:25:46 字数 273 浏览 15 评论 0原文

好吧，我想做的是一种音频处理软件，它可以检测流行频率，如果该频率播放足够长的时间（几毫秒），我知道我得到了积极的匹配。我知道我需要使用 FFT 或类似的东西，但在这个数学领域我很糟糕，我确实在互联网上搜索过，但没有找到只能做到这一点的代码。

我试图实现的目标是让自己成为一个自定义协议来通过声音发送数据，每秒需要非常低的比特率（5-10bps），但我在传输端也非常有限，因此接收软件需要能够自定义（无法使用实际的硬件/软件调制解调器）我也希望这只是软件（除了声卡之外没有其他硬件）

非常感谢您的帮助。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

孤云独去闲 2024-09-04 16:25:46

aubio 库已用 SWIG 封装，因此可以由 Python 使用。它们的众多功能包括多种音调检测/估计方法，包括 YIN算法和一些谐波梳算法。

但是，如果您想要更简单的东西，我不久前编写了一些用于音调估计的代码，您可以接受或保留它。它不会像使用aubio中的算法那么准确，但它可能足以满足您的需求。我基本上只是将数据的 FFT 乘以一个窗口（在本例中为 Blackman 窗口），对 FFT 值进行平方，找到具有最高值的 bin，并使用最大值的对数在峰值周围进行二次插值及其两个相邻值来找到基频。我从我找到的一些论文中得到了二次插值。

它在测试音上工作得相当好，但它不如上面提到的其他方法那么强大或准确。可以通过增加块大小来提高准确性（或通过减少块大小来降低准确性）。块大小应为 2 的倍数才能充分利用 FFT。另外，我只是确定每个块的基本音高，没有重叠。我使用 PyAudio 播放声音，同时写出估计的音高。

源代码：

# Read in a WAV and find the freq's
import pyaudio
import wave
import numpy as np

chunk = 2048

# open up a wave
wf = wave.open('test-tones/440hz.wav', 'rb')
swidth = wf.getsampwidth()
RATE = wf.getframerate()
# use a Blackman window
window = np.blackman(chunk)
# open stream
p = pyaudio.PyAudio()
stream = p.open(format =
                p.get_format_from_width(wf.getsampwidth()),
                channels = wf.getnchannels(),
                rate = RATE,
                output = True)

# read some data
data = wf.readframes(chunk)
# play stream and find the frequency of each chunk
while len(data) == chunk*swidth:
    # write data out to the audio stream
    stream.write(data)
    # unpack the data and times by the hamming window
    indata = np.array(wave.struct.unpack("%dh"%(len(data)/swidth),\
                                         data))*window
    # Take the fft and square each value
    fftData=abs(np.fft.rfft(indata))**2
    # find the maximum
    which = fftData[1:].argmax() + 1
    # use quadratic interpolation around the max
    if which != len(fftData)-1:
        y0,y1,y2 = np.log(fftData[which-1:which+2:])
        x1 = (y2 - y0) * .5 / (2 * y1 - y2 - y0)
        # find the frequency and output it
        thefreq = (which+x1)*RATE/chunk
        print "The freq is %f Hz." % (thefreq)
    else:
        thefreq = which*RATE/chunk
        print "The freq is %f Hz." % (thefreq)
    # read some more data
    data = wf.readframes(chunk)
if data:
    stream.write(data)
stream.close()
p.terminate()

The aubio libraries have been wrapped with SWIG and can thus be used by Python. Among their many features include several methods for pitch detection/estimation including the YIN algorithm and some harmonic comb algorithms.

However, if you want something simpler, I wrote some code for pitch estimation some time ago and you can take it or leave it. It won't be as accurate as using the algorithms in aubio, but it might be good enough for your needs. I basically just took the FFT of the data times a window (a Blackman window in this case), squared the FFT values, found the bin that had the highest value, and used a quadratic interpolation around the peak using the log of the max value and its two neighboring values to find the fundamental frequency. The quadratic interpolation I took from some paper that I found.

It works fairly well on test tones, but it will not be as robust or as accurate as the other methods mentioned above. The accuracy can be increased by increasing the chunk size (or reduced by decreasing it). The chunk size should be a multiple of 2 to make full use of the FFT. Also, I am only determining the fundamental pitch for each chunk with no overlap. I used PyAudio to play the sound through while writing out the estimated pitch.

Source Code:

# Read in a WAV and find the freq's
import pyaudio
import wave
import numpy as np

chunk = 2048

# open up a wave
wf = wave.open('test-tones/440hz.wav', 'rb')
swidth = wf.getsampwidth()
RATE = wf.getframerate()
# use a Blackman window
window = np.blackman(chunk)
# open stream
p = pyaudio.PyAudio()
stream = p.open(format =
                p.get_format_from_width(wf.getsampwidth()),
                channels = wf.getnchannels(),
                rate = RATE,
                output = True)

# read some data
data = wf.readframes(chunk)
# play stream and find the frequency of each chunk
while len(data) == chunk*swidth:
    # write data out to the audio stream
    stream.write(data)
    # unpack the data and times by the hamming window
    indata = np.array(wave.struct.unpack("%dh"%(len(data)/swidth),\
                                         data))*window
    # Take the fft and square each value
    fftData=abs(np.fft.rfft(indata))**2
    # find the maximum
    which = fftData[1:].argmax() + 1
    # use quadratic interpolation around the max
    if which != len(fftData)-1:
        y0,y1,y2 = np.log(fftData[which-1:which+2:])
        x1 = (y2 - y0) * .5 / (2 * y1 - y2 - y0)
        # find the frequency and output it
        thefreq = (which+x1)*RATE/chunk
        print "The freq is %f Hz." % (thefreq)
    else:
        thefreq = which*RATE/chunk
        print "The freq is %f Hz." % (thefreq)
    # read some more data
    data = wf.readframes(chunk)
if data:
    stream.write(data)
stream.close()
p.terminate()

回复收藏 0 原文

陈年往事 2024-09-04 16:25:46

如果您要使用FSK（频移键控）来编码数据，您'使用 Goertzel 算法可能会更好，这样您就可以只检查您想要的频率，而不是完整的 DFT/FFT。

回复收藏 0 原文

你列表最软的妹 2024-09-04 16:25:46

您可以从此处找到声音上滑动窗口的频谱然后通过查找此处。

import numpy as np
import matplotlib.pyplot as plt
from sklearn.metrics import auc
np.random.seed(0)

# Sine sample with a frequency of 5hz and add some noise
sr = 32  # sampling rate
y = np.linspace(0, 5 * 2*np.pi, sr)
y = np.tile(np.sin(y), 5)
y += np.random.normal(0, 1, y.shape)
t = np.arange(len(y)) / float(sr)

# Generate frquency spectrum
spectrum, freqs, _ = plt.magnitude_spectrum(y, sr)

# Calculate percentage for a frequency range 
lower_frq, upper_frq = 4, 6
ind_band = np.where((freqs > lower_frq) & (freqs < upper_frq))
plt.fill_between(freqs[ind_band], spectrum[ind_band], color='red', alpha=0.6)
frq_band_perc = auc(freqs[ind_band], spectrum[ind_band]) / auc(freqs, spectrum)
print('{:.1%}'.format(frq_band_perc))
# 19.8%

You can find the frequency spectrum of the sliding windows over your sound from here and then check the presence of the prevalent frequency band via finding the area under the frequency spectrum curve for that band from here.

import numpy as np
import matplotlib.pyplot as plt
from sklearn.metrics import auc
np.random.seed(0)

# Sine sample with a frequency of 5hz and add some noise
sr = 32  # sampling rate
y = np.linspace(0, 5 * 2*np.pi, sr)
y = np.tile(np.sin(y), 5)
y += np.random.normal(0, 1, y.shape)
t = np.arange(len(y)) / float(sr)

# Generate frquency spectrum
spectrum, freqs, _ = plt.magnitude_spectrum(y, sr)

# Calculate percentage for a frequency range 
lower_frq, upper_frq = 4, 6
ind_band = np.where((freqs > lower_frq) & (freqs < upper_frq))
plt.fill_between(freqs[ind_band], spectrum[ind_band], color='red', alpha=0.6)
frq_band_perc = auc(freqs[ind_band], spectrum[ind_band]) / auc(freqs, spectrum)
print('{:.1%}'.format(frq_band_perc))
# 19.8%

回复收藏 0 原文