如何在Huggingsound中获得相对于音频时间的字母位置?
因此,我使用STT模型(SecemREcognitionModel
)。我得到了如何获得句子,但我想知道如何获得相应的音频时间来输出字母。那么,如何在拥抱面中获得相对于音频时间的字母位置?
So I use a STT model (SpeechRecognitionModel
). I get how to get a sentence, yet I wonder how to get a corresponding audio timings to outputted letters. So how to get letters position relative to audio time in huggingface?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
huggingsound 创建者在这里!
转录
方法返回的转录包含您传递给方法的音频的启动/结束时间戳...start_timestamps [i]
为您提供时间在转录中检测到第i章字母时,以毫秒为单位。end_timestamps [i]
在转录中停止检测到i-the字母时,给您时间以毫秒为单位。因此,您可以使用
start_timestamps
和end_timestamps
列表来获取时间,甚至是音频文件的每个字母的持续时间:...时间:
The HuggingSound creator here! The transcriptions returned by the
transcribe
method contain the start/end timestamps in milliseconds of the audios that you passed to the method...The
start_timestamps[i]
gives you the time in milliseconds when the i-th letter was detected in the transcription. Theend_timestamps[i]
gives you the time in milliseconds when the i-th letter stopped being detected in the transcription.So you can use the
start_timestamps
andend_timestamps
lists to get the timing and even the duration of each letter of an audio file:... And to get the letter position given a time: