使用 C++ 实现 Soundex 算法

发布于 2024-07-17 20:43:03 字数 2898 浏览 5 评论 0原文

简而言之,Soundex 算法将一系列字符更改为代码。 产生相同 Soundex 代码的字符被称为听起来相同。

  • 代码是 4 个字符宽
  • 代码的第一个字符始终是单词的第一个字符

字母表中的每个字符都属于一个特定的组(至少在这个示例中,以及此后的代码,这是我将坚持的规则):

  • b, p, v, f = 1
  • c, g, j, k, q, s, x, z = 2
  • d, t = 3
  • l = 4
  • m, n = 5
  • r = 6
  • 字母表中的所有其他字母属于组 0。

其他值得注意的规则包括:

  • 属于组 0 的所有字母都将被忽略,除非您用完所提供单词中的字母,在这种情况下,代码的其余部分将用 0 填充。
  • 相同的数字不能连续使用两次或多次,因此该字符将被忽略。 唯一的例外是上面带有多个 0 的规则。

例如,单词“Ray”将生成以下 Soundex 代码:R000(R 是所提供单词的第一个字符,a 属于第 0 组,因此被忽略,y 属于第 0 组,因此被忽略,没有更多字符,因此代码中剩余的 3 个字符为 0)。

我创建了一个函数,它已传递给它 1) 一个 128 字符数组,用于创建 Soundex 代码;2) 一个空的 5 字符数组,用于在函数完成时存储 Soundex 代码(以及通过引用传回,就像我的程序中使用的大多数数组一样)。

然而,我的问题是转换过程。 我上面提供的逻辑在我的代码中并不完全有效。 同时我不知道为什么。

// CREATE A SOUNDEX CODE
// * Parameter list includes the string of characters that are to be converted to code and a variable to save the code respectively.
void SoundsAlike(const char input[], char scode[])
{
    scode[0] = toupper(input[0]); // First character of the string is added to the code

    int matchCount = 1;
    int codeCount = 1;
    while((matchCount < strlen(input)) && (codeCount < 4))
    {
        if(((input[matchCount] == 'b') || (input[matchCount] == 'p') || (input[matchCount] == 'v') || (input[matchCount] == 'f')) && (scode[codeCount-1] != 1))
        {
            scode[codeCount] = 1;
            codeCount++;
        }
        else if(((input[matchCount] == 'c') || (input[matchCount] == 'g') || (input[matchCount] == 'j') || (input[matchCount] == 'k') || (input[matchCount] == 'q') || (input[matchCount] == 's') || (input[matchCount] == 'x') || (input[matchCount] == 'z')) && (scode[codeCount-1] != 2))
        {
            scode[codeCount] = 2;
            codeCount++;
        }
        else if(((input[matchCount] == 'd') || (input[matchCount] == 't')) && (scode[codeCount-1] != 3))
        {
            scode[codeCount] = 3;
            codeCount++;
        }
        else if((input[matchCount] == 'l') && (scode[codeCount-1] != 4))
        {
            scode[codeCount] = 4;
            codeCount++;
        }
        else if(((input[matchCount] == 'm') || (input[matchCount] == 'n')) && (scode[codeCount-1] != 5))
        {
            scode[codeCount] = 5;
            codeCount++;
        }
        else if((input[matchCount] == 'r') && (scode[codeCount-1] != 6))
        {
            scode[codeCount] = 6;
            codeCount++;
        }
        matchCount++;
    }

    while(codeCount < 4)
    {
        scode[codeCount] = 0;
        codeCount++;
    }
    scode[4] = '\0';

    cout << scode << endl;
}

我不确定是否是因为我过度使用了 strlen,但由于某种原因,当程序在第一个 while 循环内运行时,没有任何字符实际上转换为代码(即没有任何 if 语句实际运行)。

那么我做错了什么? 任何帮助将不胜感激。

Put simply a Soundex Algorithm changes a series of characters into a code. Characters that produce the same Soundex code are said to sound the same.

  • The code is 4 characters wide
  • The first character of the code is always the first character of the word

Each character in the alphabet belongs in a particular group (at least in this example, and code thereafter this is the rule I'll be sticking with):

  • b, p, v, f = 1
  • c, g, j, k, q, s, x, z = 2
  • d, t = 3
  • l = 4
  • m, n = 5
  • r = 6
  • Every other letter in the alphabet belongs to group 0.

Other notable rules include:

  • All letters that belong to group 0 are ignored UNLESS you have run out of letters in the provided word, in which case the rest of the code is filled with 0's.
  • The same number cannot be used twice or more consecutively, thus the character is ignored. The only exception is the rule above with multiple 0's.

For example, the word "Ray" will produce the following Soundex code: R000 (R is the first character of the provided word, a is apart of group 0 so it's ignored, y is apart of group 0 so it's ignored, there are no more characters so the 3 remaining characters in the code are 0).

I've created a function that has passed to it 1) a 128 character array which is used in create the Soundex code and 2) an empty 5 character array which will be used to store the Soundex code at the completion of the function (and pass back by reference as most arrays do for use in my program).

My problem is however, with the conversion process. The logic I've provided above isn't exactly working in my code. And I do not know why.

// CREATE A SOUNDEX CODE
// * Parameter list includes the string of characters that are to be converted to code and a variable to save the code respectively.
void SoundsAlike(const char input[], char scode[])
{
    scode[0] = toupper(input[0]); // First character of the string is added to the code

    int matchCount = 1;
    int codeCount = 1;
    while((matchCount < strlen(input)) && (codeCount < 4))
    {
        if(((input[matchCount] == 'b') || (input[matchCount] == 'p') || (input[matchCount] == 'v') || (input[matchCount] == 'f')) && (scode[codeCount-1] != 1))
        {
            scode[codeCount] = 1;
            codeCount++;
        }
        else if(((input[matchCount] == 'c') || (input[matchCount] == 'g') || (input[matchCount] == 'j') || (input[matchCount] == 'k') || (input[matchCount] == 'q') || (input[matchCount] == 's') || (input[matchCount] == 'x') || (input[matchCount] == 'z')) && (scode[codeCount-1] != 2))
        {
            scode[codeCount] = 2;
            codeCount++;
        }
        else if(((input[matchCount] == 'd') || (input[matchCount] == 't')) && (scode[codeCount-1] != 3))
        {
            scode[codeCount] = 3;
            codeCount++;
        }
        else if((input[matchCount] == 'l') && (scode[codeCount-1] != 4))
        {
            scode[codeCount] = 4;
            codeCount++;
        }
        else if(((input[matchCount] == 'm') || (input[matchCount] == 'n')) && (scode[codeCount-1] != 5))
        {
            scode[codeCount] = 5;
            codeCount++;
        }
        else if((input[matchCount] == 'r') && (scode[codeCount-1] != 6))
        {
            scode[codeCount] = 6;
            codeCount++;
        }
        matchCount++;
    }

    while(codeCount < 4)
    {
        scode[codeCount] = 0;
        codeCount++;
    }
    scode[4] = '\0';

    cout << scode << endl;
}

I'm not sure if it's because of my overuse of strlen, but for some reason while the program is running within the first while loop none of the characters are actually converted to code (i.e. none of the if statements are actually run).

So what am I doing wrong? Any help would be greatly appreciated.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

还如梦归 2024-07-24 20:43:03

前者实际上是第一个 ascii 字符,而后者是字符“1”,而不是

scode[codeCount] = 1;

编写。

scode[codeCount] = '1';

在形成 char 数组时

Instead of

scode[codeCount] = 1;

you should write

scode[codeCount] = '1';

as you are forming a char array, the former is actually the first ascii character while the latter is the character '1'.

南街女流氓 2024-07-24 20:43:03

C++ 不支持动态数组,您似乎正在尝试使用它。 您需要研究 std::string 类的使用。 我本质上你的循环变成这样:

void Soundex( const string & input, string & output ) {
   for ( int i = 0; i < input.length(); i++ ) {
       char c = input[i];        // get character from input
       if ( c === .... ) {       // if some decision
            output += 'X';       // add some character to output
       }
       else if ( ..... )  {       // more tests
       }
   }
}

C++ does not support dynamic arrays, which you seem to be attempting to use. You need to investigate the use of the std::string class. I essence your loop becomes something like this:

void Soundex( const string & input, string & output ) {
   for ( int i = 0; i < input.length(); i++ ) {
       char c = input[i];        // get character from input
       if ( c === .... ) {       // if some decision
            output += 'X';       // add some character to output
       }
       else if ( ..... )  {       // more tests
       }
   }
}
因为看清所以看轻 2024-07-24 20:43:03

您正在调用 strlen() ,而没有在字符串中添加任何空字符终止符。 所以 strlen() 的返回值可以是任何值。 您可以通过在开始之前用“\0”填充“scode”来解决此问题,不过最好有一个单独的计数器,并在完成后添加“\0”。

You are calling strlen() without having added any null char termination in the string. So the return value of strlen() could be just anything. You could fix this by filling "scode" with '\0's before you begin, alhough it whould be better to have a separate counter for that and just add the '\0' when you are done.

黎歌 2024-07-24 20:43:03

这实际上是 C 实现而不是 C++。 不管怎样,你确定你的字符串是空终止的吗? 否则 strlen 将不起作用。

这些建议将使您的代码更易于阅读和调试:

  • 在开始之前将输入转换为小写。 测试非法字符。
  • 定义一个变量,将其设置为 input[matchCount] 并使用它。 它将使代码更具可读性。
  • 我建议用 switch-case 语句替换 if-else 语句。
  • 适应默认情况(不调用任何 if-else 或 case 语句)

This is actually a C implementation and not C++. Anyway, are you sure that your strings are null terminated? Otherwise strlen will not work.

These are some advices that will make your code easier to read and debug:

  • Convert your input to lower case before starting. Test for illegal charactes.
  • Define a variable, set it to input[matchCount] and use this. It will make the code more readable.
  • I would recommend to replace if-else statements with a switch-case one.
  • Accommodate for the default case (none of the if-else or case statements called)
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文