比较 android 中的语音 wav 或语音标签(语音命令)API
我正在开发一个应用程序,我需要某种方法来比较两个语音是否匹配,我知道语音识别器是一种方法,但因为(我认为)它需要首先将语音转换为字符串,所以除了语音识别器支持的语言之外,不太适合其他语言......有什么想法吗?就像过去的电话一样,语音标签只是将语音输入与之前在设置过程中录制的语音进行比较
I'm developing an app and I need some way to compare 2 voices if they' match or not, I know that Voice Recognizer is a way to do that but since (i think) it needs to translate the voice into string first, it won't be so suitable for other language apart from the lang supported by the speech recognizer....any idea? Just like old-day phone used to do, the voice tag where it just compare the voice input with the voice it recorded earlier during the setup
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
一种相对简单的方法是使用 FFT(快速傅里叶变换)将原始 WAV 文件的时域数据转换为频域数据(其中变换后的数组中的每个值代表特定频带的相对幅度/强度)。
如果同一个人说同一个单词两次,则两个 WAV 文件中所得的时域数据在数值上仍然会有很大差异。将两个 WAV 文件转换为频域(对两个文件使用相同大小的 FFT 窗口,即使两个文件的长度略有不同)将生成比原始 WAV 文件更相似的频率数组。
不幸的是,我还没有找到任何专门针对 Android 的 FFT 库。这是一个引用一些基于 Java 的库的问题:
Java 中的信号处理库?
A relatively simple way to do this is to use FFT (Fast Fourier Transform) to convert the time-domain data of the original WAV file into frequency-domain data (in which each value in your transformed array represents the relative magnitude/intensity of a particular frequency band).
If the same person speaks the same word twice, the resulting time-domain data will nevertheless still be very different numerically in the two WAV files. Converting both WAV files to the frequency domain (using the same size of FFT window for both, even if the two files are of slightly different lengths) will produce frequency arrays that are much more similar to each other than were the original WAV files.
Unfortunately, I haven't been able to find any FFT libraries specifically for Android. Here's a question that references some Java-based libraries:
Signal processing library in Java?
一个想法是比较声谱图中声音的相似度。声谱图具有鲁棒性和抗噪声的特点,对于分析两种语音有很好的参考作用。
如果你采用这种方法,你应该首先找出声音的特征,然后你需要知道如何比较两个声谱图中的特征,它指的是模式识别。
此 API http://code.google.com/p/musicg-sound-api/ 是用java编写的,可以在android中使用。它捕获波谱图。
An idea is comparing the similarity of the voices in their spectograms. The features in spectrogram is robust and resist to noise which is a good reference for analysing two voice.
If you take this approach you should find out the features of the voices first and than you need to know how to compare the features in two spectrograms, it refers to pattern recognition.
This api http://code.google.com/p/musicg-sound-api/ is written in java and can be used in android. It captures the wave spectrogram.