小型嵌入式合成语音库/建议
对于代码大小比语音质量更重要的 PIC 和/或 ARM 嵌入式系统,是否有任何易于使用的免费或廉价语音合成库?如今,1兆封装似乎被认为是“紧凑”的,但许多微控制器都比这个小。早在 1980 年代,Apple 就聘请了一家承包商来生产 Macintalk,它以 26K 封装提供了合理质量的语音,并在 7.16MHz 68000 上运行,而一个名为 SAM 的程序可以产生不太好但仍然可用的语音,具有在 1MHz 6502 上运行的 16K 软件包。SpeakJet 在某种类型的 PIC 上运行语音合成算法。
我可能不会特别需要发出语音,但希望能够说出由许多预设单词组成的消息。显然,可以简单地预先录制所有消息,但对于例如 100 个单词的词汇表,我认为存储 16K 的代码加上可能 1K 的语音字符串将比存储 100 个单词的音频更紧凑。
或者,如果我想存储 100 个单词的音频,生成一组自然流动的单词的最佳方法是什么?在老式语音合成器上,任何给定的单词都可以用三种方式说出:中性语调变化、下降语调变化(就像后面跟一个句点)或上升语调变化(后面跟一个问号)。具有中性变形的单词可以以任何顺序拼接在一起并且听起来不错。不过,我发现的文本到波形工具似乎喜欢添加更精细的变形细节,如果单词被分割并重新排序,这些细节听起来会“关闭”。是否有任何工具旨在产生可以很好地串联和拼接的波?如果我确实使用这样的工具,哪种音频格式最适合存储波形,以便在小型微控制器上进行有效解码?
Are there any easy-to-use free or cheap speech synthesis libraries for PIC and/or ARM embedded systems where code size is more important than speech quality? Nowadays it seems that a 1 meg package is considered "compact", but a lot of microcontrollers are smaller than that. Back in the 1980's Apple hired a contractor to produce Macintalk, which offered reasonable-quality speech in a 26K package which ran on a 7.16MHz 68000, and a program called SAM could produce speech that wasn't quite as good, but still serviceable, with a 16K package that ran on a 1MHz 6502. The SpeakJet runs a speech-synthesis algorithm on some type of PIC.
I probably wouldn't particularly need to produce speech, but would want to be able to speak messages formed from a number of pre-set words. Obviously it would be possible to simply prerecord all the messages, but with a vocabulary of e.g. 100 words, I would think that storing 16K worth of code plus maybe 1K worth of phonetic strings would be more compact than storing audio for 100 words.
Alternatively, if I wanted to store audio for 100 words, what would be the best way of generating a set of words that would flow naturally together? On older-style speech synthesizers, any given word could be spoken three ways: neutral inflection, falling inflection (as if followed by a period), or rising inflection (followed by a question mark). Words with neutral inflection could be spliced together in any order and sound fine. The text-to-wave tools I've found, though, seem to like to add finer details of inflection which sound "off" if words are cut apart and resequenced. Are there any tools which are designed for producing waves that can be concatenated and spliced nicely? If I do use such a tool, what audio format would be best for storing the waves so as to allow efficient decoding on a small microcontroller?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
上次我这样做时,我能够添加硬件,例如:http://www.sparkfun.com/products/9578。正如我遇到的那样,您的环境中可能存在专利责任,迫使您使用商业软件堆栈或 OTS 芯片。
否则,我使用了 http://www.speech.cs.cmu.edu/flite/ 对于更宽松的项目,效果很好。
Last time I did this I was able add hardware like:http://www.sparkfun.com/products/9578 . There may be patent liabilities in your environment, like I ran into, that force a commercial software stack or OTS chip.
Otherwise, I've used http://www.speech.cs.cmu.edu/flite/ for more lenient projects, and it worked well.