如何计算两个不同长度的语音文件与Python之间的相似性?

发布于 2025-01-20 12:57:05 字数 1124 浏览 7 评论 0原文

我想比较两个语音文件。 第一个文件(ref)和比较文件(comp)分别由不同的人发音。 我的假设是,语音相似度越接近,发音、语调、语气就会相同。 然而,问题是这两个文件的长度不同。可以比较吗?

!pip install librosa     # colab

import numpy as np
import matplotlib
import matplotlib.pyplot as plt
import librosa
import librosa.display

plt.figure(figsize=(10, 3))

x_1, fs_1 = librosa.load('voice_tar.wav')
x_2, fs_2 = librosa.load('voice_comp.wav')

print('<Voice_tar>', 'audio shape:', x_1.shape, 'length:', x_1.shape[0]/float(fs_1), 'secs')
print('<Voice_comp>', 'audio shape:', x_2.shape, 'length:', x_2.shape[0]/float(fs_2), 'secs')

fig, ax = plt.subplots(nrows=2, sharex=True, sharey=True)
librosa.display.waveshow(x_1, sr=fs_1, ax=ax[0])
ax[0].set(title='Voice_tar')
ax[0].label_outer()

librosa.display.waveshow(x_2, sr=fs_2, ax=ax[1])
ax[1].set(title='Voice_comp')

结果如下。
<语音_tar>音频形状:(43395,)长度:1.9680272108843537 秒
音频形状:(31673,)长度:1.4364172335600907 秒

这是2个语音文件的图像。
2 个语音文件的图像

而且,如何通过 ibrosa.segment.cross_similarity() 获得相似度?

I'd like to compare two voice files.
The first file(ref) and the comparison file(comp) are pronounced by different person, respectively.
My hypothesis is that the closer the speech similarity is, the same pronunciation, intonation, and tone will be.
However, the problem is that the two files have different lengths. Is it possible to compare?

!pip install librosa     # colab

import numpy as np
import matplotlib
import matplotlib.pyplot as plt
import librosa
import librosa.display

plt.figure(figsize=(10, 3))

x_1, fs_1 = librosa.load('voice_tar.wav')
x_2, fs_2 = librosa.load('voice_comp.wav')

print('<Voice_tar>', 'audio shape:', x_1.shape, 'length:', x_1.shape[0]/float(fs_1), 'secs')
print('<Voice_comp>', 'audio shape:', x_2.shape, 'length:', x_2.shape[0]/float(fs_2), 'secs')

fig, ax = plt.subplots(nrows=2, sharex=True, sharey=True)
librosa.display.waveshow(x_1, sr=fs_1, ax=ax[0])
ax[0].set(title='Voice_tar')
ax[0].label_outer()

librosa.display.waveshow(x_2, sr=fs_2, ax=ax[1])
ax[1].set(title='Voice_comp')

The results are as follows.
<Voice_tar> audio shape: (43395,) length: 1.9680272108843537 secs
<Voice_comp> audio shape: (31673,) length: 1.4364172335600907 secs

This is the image of 2 voice files.
image of 2 voice files

And, how can I get similarity with ibrosa.segment.cross_similarity()?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

生寂 2025-01-27 12:57:05

我也在研究这个问题。最好直接处理音频,但是上面仍然没有解决方案,您可以尝试将其转换为具有图像问题的计算机视觉,如

  1. 读取2个音频,将其功能写入2张图像,示例

      x,sr = librosa.load(duongdan)
    
    如果显示:
        librosa.display.waveshow(y_ref,sr = sr1,alpha = 0.4)
        plt.show()
    
    fea = libreosa.feature.spectral_centroid(y = x,sr = sr)[0]#或您想要的任何功能
    帧=范围(len(fea))#计算可视化的时间变量
    t = librosa.frames_to_time(帧)
    plt.plot(t,fea,color ='b')
     
  2. 计算这两个图像之间的相似性,使用phash为 https://pypi.org/project/imagehash/ https://github.com/thorn-oss/perception; 或我考虑过直方图,但还没有做任何事情

您是否尝试过此事我们如何检查哈希值之间的相似性python中的两个音频文件? noreferrer“> https://librosa.org/doc/main/generated/librosa.segment.cross_similarity.html#librosa.segment.cross_simurility

I am studying about this problem, too; it is better to process audio directly but there is still no solution above, you can try converting this to computer vision with image problem, as below

  1. read 2 audios, write their features into 2 images, example

    x, sr = librosa.load(duongdan)
    
    if show:
        librosa.display.waveshow(y_ref, sr=sr1, alpha=0.4)
        plt.show()
    
    fea = librosa.feature.spectral_centroid(y=x, sr=sr)[0]  # or any feature you want
    frames = range(len(fea))  # Computing the time variable for visualization
    t = librosa.frames_to_time(frames)
    plt.plot(t, fea, color='b')
    
  2. calculate similarity between these two images, use phash as https://pypi.org/project/ImageHash/ or https://github.com/thorn-oss/perception; or I have thought about histogram but haven't done anything yet

Have you tried this how do we check similarity between hash values of two audio files in python? and this https://librosa.org/doc/main/generated/librosa.segment.cross_similarity.html#librosa.segment.cross_similarity

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文