使用Python中的互相关比较两个音频（.wav）文件

发布于 2025-02-06 14:51:46 字数 1045 浏览 2 评论 0原文

我需要比较两个音频文件以检查它们之间的相似性。因此，我已经使用Python使用了互相关方法。这里是我的代码：

from scipy.io import wavfile
from scipy import signal
import numpy as np


sample_rate_a, data_a = wavfile.read('new.wav')
sample_rate_b, data_b = wavfile.read('result.wav')

data_a = np.float32(data_a)
data_b = np.float32(data_b)
corr = signal.correlate(data_a, data_b)
lags = signal.correlation_lags(len(data_a), len(data_b))
corr = corr / np.max(corr)
def Average(l): 
    avg = sum(l) / len(l) 
    return avg
average = Average(corr) 
  

lag = lags[np.argmax(corr)]
print(corr)
print("Lag =",lag, "np max=", np.max(corr))
print("np.min=",np.min(corr)) 
print("Average of my_list is",abs(average))

我打印了几个值，例如归一化的相关值，滞后和其归一化的最小值和最大值的平均值，以了解我的输出。这是我的输出：

[-3.5679664e-09 -1.1893221e-09  2.3786442e-09 ...  1.1893221e-09
 -1.1893221e-09 -4.7572883e-09]
Lag = 2886023 np max= 1.0
np.min= -1.8993026
Average of my_list is 6.370856069729521e-05

我对此输出有些困惑，因为我无法理解这些值的含义。谁能帮助我弄清楚这些输出值是什么？对于两个音频文件的相似性，我只需要获得一个百分比值。

谢谢

原文

I need to compare two audio files to check the similarity between them. So that I have used the cross-correlation method using python.Here is my code:

from scipy.io import wavfile
from scipy import signal
import numpy as np


sample_rate_a, data_a = wavfile.read('new.wav')
sample_rate_b, data_b = wavfile.read('result.wav')

data_a = np.float32(data_a)
data_b = np.float32(data_b)
corr = signal.correlate(data_a, data_b)
lags = signal.correlation_lags(len(data_a), len(data_b))
corr = corr / np.max(corr)
def Average(l): 
    avg = sum(l) / len(l) 
    return avg
average = Average(corr) 
  

lag = lags[np.argmax(corr)]
print(corr)
print("Lag =",lag, "np max=", np.max(corr))
print("np.min=",np.min(corr)) 
print("Average of my_list is",abs(average))

I have printed several values such as normalized correlation values,lag and the average of its normalized min and max values to get an idea of my output. here is my output:

[-3.5679664e-09 -1.1893221e-09  2.3786442e-09 ...  1.1893221e-09
 -1.1893221e-09 -4.7572883e-09]
Lag = 2886023 np max= 1.0
np.min= -1.8993026
Average of my_list is 6.370856069729521e-05

I am a bit confused about this output because I can not understand the meaning of these values. Can anyone help me to figure out what are these output values? I need to get only a percentage value for the similarity of the two audio files.

Thank you

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

聽兲甴掵 2025-02-13 14:51:46

我不知道如何解释您的输出，但是在下面有一个代码，可以使用Python获得两个音频文件的相似性，从而获得100到100的数字，它可以通过从音频文件中生成指纹并使用它们基于它们进行比较。互动相关性

需要 chromaprint 和 ffmpeg 已安装，它也不适用于简短的音频文件，如果这是一个问题，您始终可以降低音频的速度，例如指南请注意，这将增加一点噪音。

# correlation.py
import subprocess
import numpy
# seconds to sample audio file for
sample_time = 500# number of points to scan cross correlation over
span = 150# step size (in points) of cross correlation
step = 1# minimum number of points that must overlap in cross correlation
# exception is raised if this cannot be met
min_overlap = 20# report match when cross correlation has a peak exceeding threshold
threshold = 0.5
# calculate fingerprint
def calculate_fingerprints(filename):
    fpcalc_out = subprocess.getoutput('fpcalc -raw -length %i %s' % (sample_time, filename))
    fingerprint_index = fpcalc_out.find('FINGERPRINT=') + 12
    # convert fingerprint to list of integers
    fingerprints = list(map(int, fpcalc_out[fingerprint_index:].split(',')))      
    return fingerprints  
    # returns correlation between lists
def correlation(listx, listy):
    if len(listx) == 0 or len(listy) == 0:
        # Error checking in main program should prevent us from ever being
        # able to get here.     
        raise Exception('Empty lists cannot be correlated.')    
    if len(listx) > len(listy):     
        listx = listx[:len(listy)]  
    elif len(listx) < len(listy):       
        listy = listy[:len(listx)]      

    covariance = 0  
    for i in range(len(listx)):     
        covariance += 32 - bin(listx[i] ^ listy[i]).count("1")  
    covariance = covariance / float(len(listx))     
    return covariance/32  
    # return cross correlation, with listy offset from listx
def cross_correlation(listx, listy, offset):    
    if offset > 0:      
        listx = listx[offset:]      
        listy = listy[:len(listx)]  
    elif offset < 0:        
        offset = -offset        
        listy = listy[offset:]      
        listx = listx[:len(listy)]  
    if min(len(listx), len(listy)) < min_overlap:       
    # Error checking in main program should prevent us from ever being      
    # able to get here.     
        return   
    #raise Exception('Overlap too small: %i' % min(len(listx), len(listy))) 
    return correlation(listx, listy)  
    # cross correlate listx and listy with offsets from -span to span
def compare(listx, listy, span, step):  
    if span > min(len(listx), len(listy)):      
    # Error checking in main program should prevent us from ever being      
    # able to get here.     
        raise Exception('span >= sample size: %i >= %i\n' % (span, min(len(listx), len(listy))) + 'Reduce span, reduce crop or increase sample_time.')

    corr_xy = []    
    for offset in numpy.arange(-span, span + 1, step):      
        corr_xy.append(cross_correlation(listx, listy, offset)) 
    return corr_xy  
    # return index of maximum value in list
def max_index(listx):   
    max_index = 0   
    max_value = listx[0]    
    for i, value in enumerate(listx):       
        if value > max_value:           
            max_value = value           
            max_index = i   
    return max_index  

def get_max_corr(corr, source, target): 
    max_corr_index = max_index(corr)    
    max_corr_offset = -span + max_corr_index * step 
    print("max_corr_index = ", max_corr_index, "max_corr_offset = ", max_corr_offset)
    # report matches    
    if corr[max_corr_index] > threshold:        
        print(('%s and %s match with correlation of %.4f at offset %i' % (source, target, corr[max_corr_index], max_corr_offset))) 

def correlate(source, target):  
    fingerprint_source = calculate_fingerprints(source) 
    fingerprint_target = calculate_fingerprints(target)     
    corr = compare(fingerprint_source, fingerprint_target, span, step)  
    max_corr_offset = get_max_corr(corr, source, target)  

if __name__ == "__main__":    
    correlate(SOURCE_FILE, TARGET_FILE)

代码从： https：//shivama20505.medium.com/audioum.com/audio-signalals， - 比较-23E431ED2207

I don't know how to interprete your output, but below there's a code to get a number from 0 to 100 for the similarity from two audio files using python, it works by generating fingerprints from audio files and comparing them based out of them using cross correlation

It requires Chromaprint and FFMPEG installed, also it doesn't work for short audio files, if this is a problem, you can always reduce the speed of the audio like in this guide, be aware this is going to add a little noise.

# correlation.py
import subprocess
import numpy
# seconds to sample audio file for
sample_time = 500# number of points to scan cross correlation over
span = 150# step size (in points) of cross correlation
step = 1# minimum number of points that must overlap in cross correlation
# exception is raised if this cannot be met
min_overlap = 20# report match when cross correlation has a peak exceeding threshold
threshold = 0.5
# calculate fingerprint
def calculate_fingerprints(filename):
    fpcalc_out = subprocess.getoutput('fpcalc -raw -length %i %s' % (sample_time, filename))
    fingerprint_index = fpcalc_out.find('FINGERPRINT=') + 12
    # convert fingerprint to list of integers
    fingerprints = list(map(int, fpcalc_out[fingerprint_index:].split(',')))      
    return fingerprints  
    # returns correlation between lists
def correlation(listx, listy):
    if len(listx) == 0 or len(listy) == 0:
        # Error checking in main program should prevent us from ever being
        # able to get here.     
        raise Exception('Empty lists cannot be correlated.')    
    if len(listx) > len(listy):     
        listx = listx[:len(listy)]  
    elif len(listx) < len(listy):       
        listy = listy[:len(listx)]      

    covariance = 0  
    for i in range(len(listx)):     
        covariance += 32 - bin(listx[i] ^ listy[i]).count("1")  
    covariance = covariance / float(len(listx))     
    return covariance/32  
    # return cross correlation, with listy offset from listx
def cross_correlation(listx, listy, offset):    
    if offset > 0:      
        listx = listx[offset:]      
        listy = listy[:len(listx)]  
    elif offset < 0:        
        offset = -offset        
        listy = listy[offset:]      
        listx = listx[:len(listy)]  
    if min(len(listx), len(listy)) < min_overlap:       
    # Error checking in main program should prevent us from ever being      
    # able to get here.     
        return   
    #raise Exception('Overlap too small: %i' % min(len(listx), len(listy))) 
    return correlation(listx, listy)  
    # cross correlate listx and listy with offsets from -span to span
def compare(listx, listy, span, step):  
    if span > min(len(listx), len(listy)):      
    # Error checking in main program should prevent us from ever being      
    # able to get here.     
        raise Exception('span >= sample size: %i >= %i\n' % (span, min(len(listx), len(listy))) + 'Reduce span, reduce crop or increase sample_time.')

    corr_xy = []    
    for offset in numpy.arange(-span, span + 1, step):      
        corr_xy.append(cross_correlation(listx, listy, offset)) 
    return corr_xy  
    # return index of maximum value in list
def max_index(listx):   
    max_index = 0   
    max_value = listx[0]    
    for i, value in enumerate(listx):       
        if value > max_value:           
            max_value = value           
            max_index = i   
    return max_index  

def get_max_corr(corr, source, target): 
    max_corr_index = max_index(corr)    
    max_corr_offset = -span + max_corr_index * step 
    print("max_corr_index = ", max_corr_index, "max_corr_offset = ", max_corr_offset)
    # report matches    
    if corr[max_corr_index] > threshold:        
        print(('%s and %s match with correlation of %.4f at offset %i' % (source, target, corr[max_corr_index], max_corr_offset))) 

def correlate(source, target):  
    fingerprint_source = calculate_fingerprints(source) 
    fingerprint_target = calculate_fingerprints(target)     
    corr = compare(fingerprint_source, fingerprint_target, span, step)  
    max_corr_offset = get_max_corr(corr, source, target)  

if __name__ == "__main__":    
    correlate(SOURCE_FILE, TARGET_FILE)

Code converted into python 3 from: https://shivama205.medium.com/audio-signals-comparison-23e431ed2207

回复收藏 0 原文

~没有更多了~