使用 C++ 实现 Soundex 算法
简而言之,Soundex 算法将一系列字符更改为代码。 产生相同 Soundex 代码的字符被称为听起来相同。
- 代码是 4 个字符宽
- 代码的第一个字符始终是单词的第一个字符
字母表中的每个字符都属于一个特定的组(至少在这个示例中,以及此后的代码,这是我将坚持的规则):
- b, p, v, f = 1
- c, g, j, k, q, s, x, z = 2
- d, t = 3
- l = 4
- m, n = 5
- r = 6
- 字母表中的所有其他字母属于组 0。
其他值得注意的规则包括:
- 属于组 0 的所有字母都将被忽略,除非您用完所提供单词中的字母,在这种情况下,代码的其余部分将用 0 填充。
- 相同的数字不能连续使用两次或多次,因此该字符将被忽略。 唯一的例外是上面带有多个 0 的规则。
例如,单词“Ray”将生成以下 Soundex 代码:R000(R 是所提供单词的第一个字符,a 属于第 0 组,因此被忽略,y 属于第 0 组,因此被忽略,没有更多字符,因此代码中剩余的 3 个字符为 0)。
我创建了一个函数,它已传递给它 1) 一个 128 字符数组,用于创建 Soundex 代码;2) 一个空的 5 字符数组,用于在函数完成时存储 Soundex 代码(以及通过引用传回,就像我的程序中使用的大多数数组一样)。
然而,我的问题是转换过程。 我上面提供的逻辑在我的代码中并不完全有效。 同时我不知道为什么。
// CREATE A SOUNDEX CODE
// * Parameter list includes the string of characters that are to be converted to code and a variable to save the code respectively.
void SoundsAlike(const char input[], char scode[])
{
scode[0] = toupper(input[0]); // First character of the string is added to the code
int matchCount = 1;
int codeCount = 1;
while((matchCount < strlen(input)) && (codeCount < 4))
{
if(((input[matchCount] == 'b') || (input[matchCount] == 'p') || (input[matchCount] == 'v') || (input[matchCount] == 'f')) && (scode[codeCount-1] != 1))
{
scode[codeCount] = 1;
codeCount++;
}
else if(((input[matchCount] == 'c') || (input[matchCount] == 'g') || (input[matchCount] == 'j') || (input[matchCount] == 'k') || (input[matchCount] == 'q') || (input[matchCount] == 's') || (input[matchCount] == 'x') || (input[matchCount] == 'z')) && (scode[codeCount-1] != 2))
{
scode[codeCount] = 2;
codeCount++;
}
else if(((input[matchCount] == 'd') || (input[matchCount] == 't')) && (scode[codeCount-1] != 3))
{
scode[codeCount] = 3;
codeCount++;
}
else if((input[matchCount] == 'l') && (scode[codeCount-1] != 4))
{
scode[codeCount] = 4;
codeCount++;
}
else if(((input[matchCount] == 'm') || (input[matchCount] == 'n')) && (scode[codeCount-1] != 5))
{
scode[codeCount] = 5;
codeCount++;
}
else if((input[matchCount] == 'r') && (scode[codeCount-1] != 6))
{
scode[codeCount] = 6;
codeCount++;
}
matchCount++;
}
while(codeCount < 4)
{
scode[codeCount] = 0;
codeCount++;
}
scode[4] = '\0';
cout << scode << endl;
}
我不确定是否是因为我过度使用了 strlen,但由于某种原因,当程序在第一个 while 循环内运行时,没有任何字符实际上转换为代码(即没有任何 if 语句实际运行)。
那么我做错了什么? 任何帮助将不胜感激。
Put simply a Soundex Algorithm changes a series of characters into a code. Characters that produce the same Soundex code are said to sound the same.
- The code is 4 characters wide
- The first character of the code is always the first character of the word
Each character in the alphabet belongs in a particular group (at least in this example, and code thereafter this is the rule I'll be sticking with):
- b, p, v, f = 1
- c, g, j, k, q, s, x, z = 2
- d, t = 3
- l = 4
- m, n = 5
- r = 6
- Every other letter in the alphabet belongs to group 0.
Other notable rules include:
- All letters that belong to group 0 are ignored UNLESS you have run out of letters in the provided word, in which case the rest of the code is filled with 0's.
- The same number cannot be used twice or more consecutively, thus the character is ignored. The only exception is the rule above with multiple 0's.
For example, the word "Ray" will produce the following Soundex code: R000 (R is the first character of the provided word, a is apart of group 0 so it's ignored, y is apart of group 0 so it's ignored, there are no more characters so the 3 remaining characters in the code are 0).
I've created a function that has passed to it 1) a 128 character array which is used in create the Soundex code and 2) an empty 5 character array which will be used to store the Soundex code at the completion of the function (and pass back by reference as most arrays do for use in my program).
My problem is however, with the conversion process. The logic I've provided above isn't exactly working in my code. And I do not know why.
// CREATE A SOUNDEX CODE
// * Parameter list includes the string of characters that are to be converted to code and a variable to save the code respectively.
void SoundsAlike(const char input[], char scode[])
{
scode[0] = toupper(input[0]); // First character of the string is added to the code
int matchCount = 1;
int codeCount = 1;
while((matchCount < strlen(input)) && (codeCount < 4))
{
if(((input[matchCount] == 'b') || (input[matchCount] == 'p') || (input[matchCount] == 'v') || (input[matchCount] == 'f')) && (scode[codeCount-1] != 1))
{
scode[codeCount] = 1;
codeCount++;
}
else if(((input[matchCount] == 'c') || (input[matchCount] == 'g') || (input[matchCount] == 'j') || (input[matchCount] == 'k') || (input[matchCount] == 'q') || (input[matchCount] == 's') || (input[matchCount] == 'x') || (input[matchCount] == 'z')) && (scode[codeCount-1] != 2))
{
scode[codeCount] = 2;
codeCount++;
}
else if(((input[matchCount] == 'd') || (input[matchCount] == 't')) && (scode[codeCount-1] != 3))
{
scode[codeCount] = 3;
codeCount++;
}
else if((input[matchCount] == 'l') && (scode[codeCount-1] != 4))
{
scode[codeCount] = 4;
codeCount++;
}
else if(((input[matchCount] == 'm') || (input[matchCount] == 'n')) && (scode[codeCount-1] != 5))
{
scode[codeCount] = 5;
codeCount++;
}
else if((input[matchCount] == 'r') && (scode[codeCount-1] != 6))
{
scode[codeCount] = 6;
codeCount++;
}
matchCount++;
}
while(codeCount < 4)
{
scode[codeCount] = 0;
codeCount++;
}
scode[4] = '\0';
cout << scode << endl;
}
I'm not sure if it's because of my overuse of strlen, but for some reason while the program is running within the first while loop none of the characters are actually converted to code (i.e. none of the if statements are actually run).
So what am I doing wrong? Any help would be greatly appreciated.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
前者实际上是第一个 ascii 字符,而后者是字符“1”,而不是
编写。
在形成 char 数组时
Instead of
you should write
as you are forming a char array, the former is actually the first ascii character while the latter is the character '1'.
C++ 不支持动态数组,您似乎正在尝试使用它。 您需要研究 std::string 类的使用。 我本质上你的循环变成这样:
C++ does not support dynamic arrays, which you seem to be attempting to use. You need to investigate the use of the std::string class. I essence your loop becomes something like this:
您正在调用 strlen() ,而没有在字符串中添加任何空字符终止符。 所以 strlen() 的返回值可以是任何值。 您可以通过在开始之前用“\0”填充“scode”来解决此问题,不过最好有一个单独的计数器,并在完成后添加“\0”。
You are calling strlen() without having added any null char termination in the string. So the return value of strlen() could be just anything. You could fix this by filling "scode" with '\0's before you begin, alhough it whould be better to have a separate counter for that and just add the '\0' when you are done.
这实际上是 C 实现而不是 C++。 不管怎样,你确定你的字符串是空终止的吗? 否则 strlen 将不起作用。
这些建议将使您的代码更易于阅读和调试:
This is actually a C implementation and not C++. Anyway, are you sure that your strings are null terminated? Otherwise strlen will not work.
These are some advices that will make your code easier to read and debug: