如何制作像 IsWordPronounceable(SomeWord:String): boolean; 这样的函数

发布于 2024-09-02 23:47:05 字数 215 浏览 16 评论 0原文

我想做一个函数 IsWordPronounceable(SomeWord:String): boolean; “英语” 我正在使用 SAPI 语音识别，我需要这个功能。我使用delphi编译器，C/C#/C++或任何语言都可以..请帮忙。我不知道如何开始...

从一开始，我认为添加语法规则可以解决问题。该场景是突出显示对用户所说的文本。但引擎无法识别不发音的单词。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

云裳 2024-09-09 23:47:05

这并不容易做到。我的方法是通过一些简单的统计分析。

首先下载一本英语单词词典（或者任何语言，实际上 - 你只需要一本“可发音”的单词词典）。然后，取出字典中的每个单词并将其分成 3 个字母的块。因此，对于“dictionary”这个词，您可以将其分解为“dic”、“ict”、“cti”、“tio”、“ion”、“ona”、“nar”和“ary”。然后将字典中所有单词中的每个三个字母块添加到一个集合中，该集合将三个字母块映射到它出现的次数。像这样的事情：

“迪克”-> 36365
“信息通信技术”-> 2721
“cti”-> 532

依此类推... 接下来，通过将每个数字除以字典中的单词总数来标准化数字。这样，您就可以将三个字母的组合映射到字典中包含该三个字母组合的单词的百分比。

最后，实现您的 IsWordPronounceable 方法，如下所示：

bool IsWordPronounceable(string word)
{
    string[] threeLetterBlocks = BreakIntoThreeLetterBlocks(word);
    foreach(string block in threeLetterBlocks)
    {
        if (blockFrequency[block] < THRESHOLD)
            return false;
    }
    return true;
}

显然，您需要“调整”一些参数。 THRESHOLD 参数是一个，块的大小可能最好是 2、3 或 4 等。我认为，需要一些调整才能使其正确。

This is not exactly easy to do. The way I would do it is with some simple statistical analysis.

Start off by downloading a dictionary of English words (or any language, really - you just need a dictionary of words that are "pronounceable"). Then, take each word in the dictionary and break it up into 3-letter blocks. So given the word "dictionary", you'd break it up into "dic", "ict", "cti", "tio", "ion", "ona", "nar", and "ary". Then add each three-letter block from all the words in the dictionary into a collection that maps the three letter block to the number of times it appears. Something like this:

"dic" -> 36365
"ict" -> 2721
"cti" -> 532

And so on... Next, normalize the numbers by dividing each number by the total number of words in the dictionary. That way, you have a mapping of three-letter combinations to the percentage of words in the dictionary that contain that three letter combination.

Finally, implement your IsWordPronounceable method something like this:

bool IsWordPronounceable(string word)
{
    string[] threeLetterBlocks = BreakIntoThreeLetterBlocks(word);
    foreach(string block in threeLetterBlocks)
    {
        if (blockFrequency[block] < THRESHOLD)
            return false;
    }
    return true;
}

Obviously, there's a few parameters you'll want to "tune". The THRESHOLD parameter is one, also the size of the blocks might be better off being 2 or 3 or 4, etc. It'll take a bit of massaging around to get it right, I think.

回复收藏 0 原文