我在哪里可以学习如何使用音频数据格式?

发布于 2024-07-15 13:21:39 字数 261 浏览 1 评论 0原文

我正在开发一个 openGL 项目,其中涉及一个会说话的卡通脸。 我的希望是播放语音(编码为 mp3)并使用音频数据为其嘴巴制作动画。 我以前从未真正使用过音频,所以我不知道从哪里开始,但一些谷歌搜索让我相信我的第一步是将 mp3 转换为 pcm。

我并不真正预计需要任何傅里叶变换,尽管这可能很好。 当有音频时,嘴巴实际上只需要四处移动(我正在考虑根据音量来调整)。

任何有关实现此类内容的提示或资源指针将不胜感激。 谢谢!

-S

I'm working on an openGL project that involves a speaking cartoon face. My hope is to play the speech (encoded as mp3s) and animate its mouth using the audio data. I've never really worked with audio before so I'm not sure where to start, but some googling led me to believe my first step would be converting the mp3 to pcm.

I don't really anticipate the need for any Fourier transforms, though that could be nice. The mouth really just needs to move around when there's audio (I was thinking of basing it on volume).

Any tips on to implement something like this or pointers to resources would be much appreciated. Thanks!

-S

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

少女净妖师 2024-07-22 13:21:39

无论您做什么,您都需要首先将 MP3 解码为 PCM 数据。 有许多第三方库可以为您执行此操作。 然后,您需要分析 PCM 数据并对其进行一些信号处理。

从音频自动生成逼真的口型同步数据是一个非常困难的问题,明智的做法是不要尝试解决它。 我喜欢你简单地基于音量的想法。 计算当前音量的一种方法是使用某种大小的滚动窗口(例如 1/16 秒),并计算该窗口上声波的平均功率。 也就是说,在帧 T 处,您计算帧 [TN, T] 上的平均功率,其中 N 是窗口中的帧数。

感谢 帕塞瓦尔定理,我们可以轻松计算波的功率,而无需采用傅里叶变换或任何复杂的东西——平均功率只是窗口中 PCM 值的平方和除以窗口中的帧数。 然后,您可以将功率除以某个基本功率(其中为简单起见,可以为 1),取对数,然后乘以 10。

Whatever you do, you're going to need to decode the MP3s into PCM data first. There are a number of third-party libraries that can do this for you. Then, you'll need to analyze the PCM data and do some signal processing on it.

Automatically generating realistic lipsync data from audio is a very hard problem, and you're wise to not try to tackle it. I like your idea of simply basing it on the volume. One way you could compute the current volume is to use a rolling window of some size (e.g. 1/16 second), and compute the average power in the sound wave over that window. That is, at frame T, you compute the average power over frames [T-N, T], where N is the number of frames in your window.

Thanks to Parseval's theorem, we can easily compute the power in a wave without having to take the Fourier transform or anything complicated -- the average power is just the sum of the squares of the PCM values in the window, divided by the number of frames in the window. Then, you can convert the power into a decibel rating by dividing it by some base power (which can be 1 for simplicity), taking the logarithm, and multiplying by 10.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文