我正在尝试从视频中实现语音转录(语音到文本)。我的方法是将其分解为3个步骤:
- 将视频文件转换
- 为带有音频文件URL
- PRASE结果
的SFSpeechRecognizer请求我的问题是,我没有找到一种转换源视频文件的方法(让我们说.mov)进入只有音频文件。视频的Avasset本身没有任何音频轨道,但是在播放文件时仍然具有音频(因此确实存在)。
我想如果我可以解决步骤1,那么2 + 3是微不足道的,所以我的问题是 - 将视频文件转换为仅音频文件的最佳方法是什么,然后我可以将其用于转录。
I'm trying to implement speech transcription (voice to text) from a video. My approach is breaking this down into 3 steps:
- Convert video to audio file (m4a/mp3)
- Pass audio to SFSpeechRecognizer request with audio file url
- Prase results
My issue is that I haven't found a way to convert the source video file (let's say .mov) into an audio only file. The AVAsset itself of the video, doesn't have any audio tracks, but still has audio when playing the file (so it does exist).
I imagine if I can solve step 1, then 2 + 3 are trivial, so my question is - what is the best way to convert a video file into an audio only file, which I can then use for transcription.
发布评论
评论(1)
您可以使用 ffmpegkit 库来提取视频的音频部分。
图书馆示例:
提取音频的ffmpeg命令示例:
You can use FFmpegKit library to extract an audio part of the video.
The library example: https://github.com/tanersener/ffmpeg-kit/tree/main/apple#3-using
The ffmpeg command example to extract audio: https://stackoverflow.com/a/27413824/5707560