傅立叶变换以转置 wav 文件的调性

发布于 2024-08-28 18:34:03 字数 289 浏览 6 评论 0原文

我想编写一个应用程序来调换 wav 文件播放的键(为了好玩,我知道有些应用程序已经做到了这一点)...我对如何实现这一点的主要理解是

1)将音频文件切入非常小的块(比如每秒 1/10)

2) 在每个块上运行 FFT

3) 根据我想要的密钥向上或向下相移频率空间

4) 使用逆 FFT 将每个块返回到时域

5 )将所有块粘合在一起

但现在我想知道当我尝试将它们重新粘合在一起时,转换后的块是否不再连续。有什么想法我应该如何做到这一点以保证连续性,还是我只是担心什么?

I want to write an app to transpose the key a wav file plays in (for fun, I know there are apps that already do this)... my main understanding of how this might be accomplished is to

1) chop the audio file into very small blocks (say 1/10 a second)

2) run an FFT on each block

3) phase shift the frequency space up or down depending on what key I want

4) use an inverse FFT to return each block to the time domain

5) glue all the blocks together

But now I'm wondering if the transformed blocks would no longer be continuous when I try to glue them back together. Are there ideas how I should do this to guarantee continuity, or am I just worrying about nothing?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

定格我的天空 2024-09-04 18:34:03

将每个块的时间样本重叠一半,以便第一个块之后的每个块包含前一个块的最后 N/2 个样本和 N/2 个新样本。确保在转换之前对样本应用一些窗口。

移频后,执行逆 FFT 并使用每个块的中间 N/2 个样本。您需要在 IFFT 后调整最终增益。

当然,将时间样本与正弦波混合然后进行低通滤波也将在时域中提供相同的偏移。混频器的频率将是所需的频率差。

Overlap the time samples for each block by half so that each block after the first consists of the last N/2 samples from the previous block and N/2 new samples. Be sure to apply some window to the samples before the transform.

After shifting the frequency, perform an inverse FFT and use the middle N/2 samples from each block. You'll need to adjust the final gain after the IFFT.

Of course, mixing the time samples with a sine wave and then low pass filtering will provide the same shift in the time domain as well. The frequency of the mixer would be the desired frequency difference.

和影子一齐双人舞 2024-09-04 18:34:03

对于演讲,您可能需要查看 PSOLA - 这是一种流行的间距算法-shifting 和/或时间拉伸/压缩 比基本的重叠相加方法稍微复杂一些,但也复杂不了多少。

如果您需要处理非语音样本,例如音乐,那么有几种可能性 ,但是其他答案中提到的重叠相加 FFT/修改/IFFT 方法可能是最好的选择。

For speech you might want to look at PSOLA - this is a popular algorithm for pitch-shifting and/or time stretching/compression which is a little more sophisticated than the basic overlap-add method, but not much more complex.

If you need to process non-speech samples, e.g. music, then there are several possibilities, however the overlap-add FFT/modify/IFFT approach mentioned in other answers is probably the best bet.

紫轩蝶泪 2024-09-04 18:34:03

找到关于这个主题的这篇很棒的文章,适合任何尝试的人将来吧!

Found this great article on the subject, for anyone trying it in the future!

笑脸一如从前 2024-09-04 18:34:03

您可能必须在块之间找到零交叉点,以将各个波形粘合在一起。否则,您可能会发现块之间会发出咔哒声或弹出声。

You may have to find a zero-crossing between the blocks to glue the individual wavs back together. Otherwise you may find that you are getting clicks or pops between the blocks.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文