评估音频源分离的负SDR结果
我正在尝试使用 eval_mus_track =“ https://pypi.org/project/museval/” rel =“ nofollow noreferrer”> museval 包装评估我的音频源分离模型。我正在评估的模型经过训练可以预测人声,结果与实际人声相似,但是评估指标(例如SDR)是负面的。
以下是我生成指标的功能:
def estimate_and_evaluate(track):
#track.audio is stereo therefore we predict each channel separately
vocals_predicted_channel_1, acompaniment_predicted_channel_1, _ = model_5.predict(np.squeeze(track.audio[:, 0]))
vocals_predicted_channel_2, acompaniment_predicted_channel_2, _ = model_5.predict(np.squeeze(track.audio[:, 1]) )
vocals = np.squeeze(np.array([vocals_predicted_channel_1.wav_file, vocals_predicted_channel_2.wav_file])).T
accompaniment = np.squeeze(np.array([acompaniment_predicted_channel_1.wav_file, acompaniment_predicted_channel_2.wav_file])).T
estimates = {
'vocals': vocals,
'accompaniment': accompaniment
}
scores = museval.eval_mus_track(track, estimates)
print(scores)
我获得的度量值是:
vocals ==> SDR: -3.776 SIR: 4.621 ISR: -0.005 SAR: -30.538
accompaniment ==> SDR: -0.590 SIR: 1.704 ISR: -0.006 SAR: -16.613
上述结果没有意义,因为首先,伴奏预测是纯粹的噪声,因为该模型经过了人声的训练,但它获得了更高的SDR。第二个原因是预测的人声与实际图具有非常相似的图,但仍然具有负SDR值! 在以下图中,顶部是实际声音,底部是预测来源:
频道2: 我试图转移预测的人声,如上所述更糟。
知道是什么原因导致了这个问题?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
信号失真比实际上是比率的对数。参见本文公式(12):
https://hal.inria.fr/inria-00630985/PDF/vincent_SigPro11。 pdf
因此,SDR 为 0 意味着信号等于失真。 SDR 值小于 0 意味着存在比信号更多的失真。如果音频听起来不像信号失真更多,则原因通常是样本对齐问题。
当您查看等式 (12) 时,您可以看到计算在很大程度上取决于保留预测的真实音频的精确样本对齐。从波形图甚至聆听中都很难判断样本是否未对齐。但是,您可以看到每个单独样本的放大图可以帮助您确保真实样本和预测样本完全对齐。即使移动一个样本,SDR 计算也不会反映实际的 SDR。
The signal to distortion ratio is actually the logarithm of a ratio. See equation (12) of this article:
https://hal.inria.fr/inria-00630985/PDF/vincent_SigPro11.pdf
So, a SDR of 0 means that the signal is equal to the distortion. An SDR value of less than 0 means that there is more distortion than signal. If the audio doesn't sound like there is more distortion than signal, the cause is often sample alignment problems.
When you look at equation (12), you can see that the calculation depends strongly on preserving the exact sample alignment of the predicted a ground-truth audio. It can be difficult to tell from plots of the waveform or even listening if the samples are misaligned. But, a zoomed-in plot where you can see each individual sample could help you make sure that the ground truth and predicted samples are exactly lined up. If it is shifted by even a single sample, the SDR calculation will not reflect the actual SDR.