衡量一个单词的发音?

发布于 2024-07-29 02:57:39 字数 156 浏览 4 评论 0原文

我正在摆弄域名查找器,并希望选择那些易于发音的单词。

示例:nameoic.com(不好)与 namelet.com(好)。

我认为与 soundex 有关的事情可能是合适的,但看起来我不能用它们来产生某种比较分数。

win的PHP代码。

I'm tinkering with a domain name finder and want to favour those words which are easy to pronounce.

Example: nameoic.com (bad) versus namelet.com (good).

Was thinking something to do with soundex may be appropriate but it doesn't look like I can use them to produce some sort of comparative score.

PHP code for the win.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

挽容 2024-08-05 02:57:39

这是一个应该与最常见的单词一起工作的函数...它应该给你一个介于 1(根据规则的完美发音)到 0 之间的不错的结果。

以下函数远非完美(它不太喜欢单词)如海啸 [0.857])。 但根据您的需求进行调整应该相当容易。

<?php
// Score: 1
echo pronounceability('namelet') . "\n";

// Score: 0.71428571428571
echo pronounceability('nameoic') . "\n";

function pronounceability($word) {
    static $vowels = array
        (
        'a',
        'e',
        'i',
        'o',
        'u',
        'y'
        );

    static $composites = array
        (
        'mm',
        'll',
        'th',
        'ing'
        );

    if (!is_string($word)) return false;

    // Remove non letters and put in lowercase
    $word = preg_replace('/[^a-z]/i', '', $word);
    $word = strtolower($word);

    // Special case
    if ($word == 'a') return 1;

    $len = strlen($word);

    // Let's not parse an empty string
    if ($len == 0) return 0;

    $score = 0;
    $pos = 0;

    while ($pos < $len) {
        // Check if is allowed composites
        foreach ($composites as $comp) {
            $complen = strlen($comp);

            if (($pos + $complen) < $len) {
                $check = substr($word, $pos, $complen);

                if ($check == $comp) {
                    $score += $complen;
                    $pos += $complen;
                    continue 2;
                }
            }
        }

        // Is it a vowel? If so, check if previous wasn't a vowel too.
        if (in_array($word[$pos], $vowels)) {
            if (($pos - 1) >= 0 && !in_array($word[$pos - 1], $vowels)) {
                $score += 1;
                $pos += 1;
                continue;
            }
        } else { // Not a vowel, check if next one is, or if is end of word
            if (($pos + 1) < $len && in_array($word[$pos + 1], $vowels)) {
                $score += 2;
                $pos += 2;
                continue;
            } elseif (($pos + 1) == $len) {
                $score += 1;
                break;
            }
        }

        $pos += 1;
    }

    return $score / $len;
}

Here is a function which should work with the most common of words... It should give you a nice result between 1 (perfect pronounceability according to the rules) to 0.

The following function far from perfect (it doesn't quite like words like Tsunami [0.857]). But it should be fairly easy to tweak for your needs.

<?php
// Score: 1
echo pronounceability('namelet') . "\n";

// Score: 0.71428571428571
echo pronounceability('nameoic') . "\n";

function pronounceability($word) {
    static $vowels = array
        (
        'a',
        'e',
        'i',
        'o',
        'u',
        'y'
        );

    static $composites = array
        (
        'mm',
        'll',
        'th',
        'ing'
        );

    if (!is_string($word)) return false;

    // Remove non letters and put in lowercase
    $word = preg_replace('/[^a-z]/i', '', $word);
    $word = strtolower($word);

    // Special case
    if ($word == 'a') return 1;

    $len = strlen($word);

    // Let's not parse an empty string
    if ($len == 0) return 0;

    $score = 0;
    $pos = 0;

    while ($pos < $len) {
        // Check if is allowed composites
        foreach ($composites as $comp) {
            $complen = strlen($comp);

            if (($pos + $complen) < $len) {
                $check = substr($word, $pos, $complen);

                if ($check == $comp) {
                    $score += $complen;
                    $pos += $complen;
                    continue 2;
                }
            }
        }

        // Is it a vowel? If so, check if previous wasn't a vowel too.
        if (in_array($word[$pos], $vowels)) {
            if (($pos - 1) >= 0 && !in_array($word[$pos - 1], $vowels)) {
                $score += 1;
                $pos += 1;
                continue;
            }
        } else { // Not a vowel, check if next one is, or if is end of word
            if (($pos + 1) < $len && in_array($word[$pos + 1], $vowels)) {
                $score += 2;
                $pos += 2;
                continue;
            } elseif (($pos + 1) == $len) {
                $score += 1;
                break;
            }
        }

        $pos += 1;
    }

    return $score / $len;
}
不语却知心 2024-08-05 02:57:39

我认为问题可以归结为将单词解析为 音素 的候选集,然后使用预先确定的音素对列表,以确定该单词的发音。

例如:“技能”的发音是“/s/k/i/l/”。 “/s/k/”、“/k/i/”、“/i/l/”都应该具有高发音分数,因此该单词应该得分高。

“skpit”的发音是“/s/k/p/i/t/”。 “/k/p/”的发音分数应该较低,因此该单词的分数应该较低。

I think the problem could be boiled down to parsing the word into a candidate set of phonemes, then using a predetermined list of phoneme pairs to determine how pronouncible the word is.

For example: "skill" phonetically is "/s/k/i/l/". "/s/k/", "/k/i/", "/i/l/" should all have high scores of pronouncibility, so the word should score highly.

"skpit" phonetically is "/s/k/p/i/t/". "/k/p/" should have a low pronouncibility score, so the word should score low.

遗弃M 2024-08-05 02:57:39

使用马尔可夫模型(当然,针对字母,而不是单词)。 单词的概率是发音难易度的一个很好的指标。 您必须对长度进行标准化,因为较长的单词本质上不太可能。

Use a Markov model (on letters, not words, of course). The probability of a word is a pretty good proxy for ease of pronunciation. You'll have to normalize for length, since longer words are inherently less probable.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文