Python中的Soundex算法(作业帮助请求)
美国人口普查局使用一种名为“soundex”的特殊编码来查找有关个人的信息。 soundex 是基于姓氏发音而非拼写方式的姓氏编码。听起来相同但拼写不同的姓氏(例如 SMITH 和 SMYTH)具有相同的代码并一起归档。 soundex 编码系统的开发是为了让您可以找到姓氏,即使它可能以各种拼写形式记录。
在本实验中,您将设计、编码并记录一个程序,该程序在输入姓氏时生成 soundex 代码。系统将提示用户输入姓氏,程序应输出相应的代码。
基本 Soundex 编码规则
姓氏的每个 soundex 编码均由一个字母和三个数字组成。使用的字母始终是姓氏的第一个字母。根据如下所示的 soundex 指南,将数字分配给姓氏的其余字母。如有必要,会在末尾添加零以始终生成四字符代码。附加字母将被忽略。
Soundex 编码指南
Soundex 为各种辅音分配一个编号。发音相似的辅音分配相同的编号:
辅音编号
1 B, F, P, V 2 C, G, J, K, Q, S, X, Z 3 D, T 4 L 5 M, N 6 R
Soundex 忽略字母 A、E、I、O、U、H、W 和 Y。
还遵循 3 个附加 Soundex 编码规则。良好的程序设计会将这些功能实现为一个或多个单独的功能。
规则 1. 姓名中含有双字母
如果姓氏中含有双字母,则应将其视为一个字母。例如:
Gutierrez 编码为 G362(G,3 表示 T,6 表示第一个 R,忽略第二个 R,2 表示 Z)。 规则 2. 具有相同 Soundex 代码编号的并排字母的姓名
如果姓氏并排具有不同字母且在 soundex 编码指南中具有相同编号,则应将它们视为一个字母。示例:
Pfister 编码为 P236(P、F 被忽略,因为它被视为与 P 相同,2 代表 S,3 代表 T,6 代表 R)。
Jackson 编码为 J250(J,2 代表 C,K 被忽略,与 C 相同,S 被忽略,与 C 相同,5 代表 N,添加 0)。
规则 3. 辅音分隔符
3.a.如果元音 (A、E、I、O、U) 分隔具有相同 soundex 代码的两个辅音,则对元音右侧的辅音进行编码。示例:
Tymczak 编码为 T-522(T、M 为 5、C 为 2、Z 被忽略(请参阅上面的“并排”规则)、K 为 2)。由于元音“A”将 Z 和 K 分开,因此对 K 进行编码。 3.b.如果“H”或“W”分隔具有相同 soundex 代码的两个辅音,则右侧的辅音不被编码。示例:
*Ashcraft 编码为 A261(A、2 代表 S,C 被忽略,因为与 S 相同,中间有 H,6 代表 R,1 代表 F)。它的编码不是 A226。
到目前为止,这是我的代码:
surname = raw_input("Please enter surname:")
outstring = ""
outstring = outstring + surname[0]
for i in range (1, len(surname)):
nextletter = surname[i]
if nextletter in ['B','F','P','V']:
outstring = outstring + '1'
elif nextletter in ['C','G','J','K','Q','S','X','Z']:
outstring = outstring + '2'
elif nextletter in ['D','T']:
outstring = outstring + '3'
elif nextletter in ['L']:
outstring = outstring + '4'
elif nextletter in ['M','N']:
outstring = outstring + '5'
elif nextletter in ['R']:
outstring = outstring + '6'
print outstring
足以满足要求,我只是不确定如何编写这三个规则。这就是我需要帮助的地方。因此,我们非常感谢任何帮助。
The US census bureau uses a special encoding called “soundex” to locate information about a person. The soundex is an encoding of surnames (last names) based on the way a surname sounds rather than the way it is spelled. Surnames that sound the same, but are spelled differently, like SMITH and SMYTH, have the same code and are filed together. The soundex coding system was developed so that you can find a surname even though it may have been recorded under various spellings.
In this lab you will design, code, and document a program that produces the soundex code when input with a surname. A user will be prompted for a surname, and the program should output the corresponding code.
Basic Soundex Coding Rules
Every soundex encoding of a surname consists of a letter and three numbers. The letter used is always the first letter of the surname. The numbers are assigned to the remaining letters of the surname according to the soundex guide shown below. Zeroes are added at the end if necessary to always produce a four-character code. Additional letters are disregarded.
Soundex Coding Guide
Soundex assigns a number for various consonants. Consonants that sound alike are assigned the same number:
Number Consonants
1 B, F, P, V 2 C, G, J, K, Q, S, X, Z 3 D, T 4 L 5 M, N 6 R
Soundex disregards the letters A, E, I, O, U, H, W, and Y.
There are 3 additional Soundex Coding Rules that are followed. A good program design would implement these each as one or more separate functions.
Rule 1. Names With Double Letters
If the surname has any double letters, they should be treated as one letter. For example:
Gutierrez is coded G362 (G, 3 for the T, 6 for the first R, second R ignored, 2 for the Z).
Rule 2. Names with Letters Side-by-Side that have the Same Soundex Code Number
If the surname has different letters side-by-side that have the same number in the soundex coding guide, they should be treated as one letter. Examples:
Pfister is coded as P236 (P, F ignored since it is considered same as P, 2 for the S, 3 for the T, 6 for the R).
Jackson is coded as J250 (J, 2 for the C, K ignored same as C, S ignored same as C, 5 for the N, 0 added).
Rule 3. Consonant Separators
3.a. If a vowel (A, E, I, O, U) separates two consonants that have the same soundex code, the consonant to the right of the vowel is coded. Example:
Tymczak is coded as T-522 (T, 5 for the M, 2 for the C, Z ignored (see "Side-by-Side" rule above), 2 for the K). Since the vowel "A" separates the Z and K, the K is coded.
3.b. If "H" or "W" separate two consonants that have the same soundex code, the consonant to the right is not coded. Example:
*Ashcraft is coded A261 (A, 2 for the S, C ignored since same as S with H in between, 6 for the R, 1 for the F). It is not coded A226.
So far this is my code:
surname = raw_input("Please enter surname:")
outstring = ""
outstring = outstring + surname[0]
for i in range (1, len(surname)):
nextletter = surname[i]
if nextletter in ['B','F','P','V']:
outstring = outstring + '1'
elif nextletter in ['C','G','J','K','Q','S','X','Z']:
outstring = outstring + '2'
elif nextletter in ['D','T']:
outstring = outstring + '3'
elif nextletter in ['L']:
outstring = outstring + '4'
elif nextletter in ['M','N']:
outstring = outstring + '5'
elif nextletter in ['R']:
outstring = outstring + '6'
print outstring
sufficiently does what it is asked to, I am just not sure how to code the three rules. That is where I need help. So, any help is appreciated.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
我建议您尝试以下操作。
一旦你很好地分解它,它应该变得更容易管理。
I would suggest you try the following.
Once you break it down nicely it should become easier to manage.
这并不完美(例如,如果输入不以字母开头,它会产生错误的结果),并且它没有将规则实现为可独立测试的函数,因此它并不能真正作为以下问题的答案家庭作业问题。但这就是我的实现方式:
This is hardly perfect (for instance, it produces the wrong result if the input doesn't start with a letter), and it doesn't implement the rules as independently-testable functions, so it's not really going to serve as an answer to the homework question. But this is how I'd implement it: