上采样/插值视频功能的简单技术?
我正在尝试同时分析音频和视觉特征。我的音频语音特征是使用隐马尔可夫模型工具包以 100fps 采样的梅尔频率倒谱系数。我的视觉特征来自于我构建的嘴唇跟踪程序,并以 29.97fps 采样。
我知道我需要对我的视觉特征进行插值,以便采样率也是 100fps,但我找不到关于如何在线执行此操作的很好的解释或教程。我找到的大部分帮助都来自语音识别社区,该社区假设读者了解插值知识,即大多数通过简单的“插值视觉特征以使采样率等于 100fps”来涵盖该步骤。
有人能指出我正确的方向吗?
感谢一百万
I'm trying to analyse audio and visual features in tandem. My audio speech features are mel-frequency cepstrum co-efficients sampled at 100fps using the Hidden Markov Model Toolkit. My visual features come from a lip-tracking programme I built and are sampled at 29.97fps.
I know that I need to interpolate my visual features so that the sample rate is also 100fps, but I can't find a nice explanation or tutorial on how to do this online. Most of the help I have found comes from the speech recognition community which assumes a knowledge of interpolation on behalf of the reader, i.e. most cover the step with a simple "interpolate the visual features so that the sample rate equals 100fps".
Can anyone pooint me in the right direction?
Thanks a million
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
由于面部运动在视频捕捉之前未经过低通滤波,因此大多数经典 DSP 插值方法可能不适用。您不妨尝试对特征向量进行线性插值,以从一组时间点到另一组不同时间点的一组时间点。只需选择 2 个最接近的视频帧并进行插值即可获取其间的更多数据点。如果您的面部跟踪算法测量面部运动的加速度,您也可以尝试样条插值。
Since face movement is not low-pass filtered prior to video capture, most of the classic DSP interpolation methods may not apply. You might as well try linear interpolation of your features vectors to get from one set of time points to a set at a different set of time points. Just pick the 2 closest video frames and interpolate to get more data points in between. You could also try spline interpolation if your facial tracking algorithm measures accelerations in face motion.