sas 生成所有可能的拼写错误
有谁知道如何生成可能的拼写错误?
示例:失业 - 失业 - 一个就业网 - ETC。
Does any one know how to generate the possible misspelling ?
Example : unemployment
- uemployment
- onemploymnet
-- etc.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
如果您只想生成可能的拼写错误列表,您可以尝试使用这样的工具< /a>.否则,在 SAS 中,您可以使用类似 COMPGED 来计算某人输入的字符串与您希望他们输入的字符串之间的相似度。如果按照您的标准,两者“足够接近”,请将其文本替换为您想要的文本。
下面是一个计算“unemployment”和各种可能的拼写错误之间的广义编辑距离的示例。
If you just want to generate a list of possible misspellings, you might try a tool like this one. Otherwise, in SAS you might be able to use a function like COMPGED to compute a measure of the similarity between the string someone entered, and the one you wanted them to type. If the two are "close enough" by your standard, replace their text with the one you wanted.
Here is an example that computes the Generalized Edit Distance between "unemployment" and a variety of plausible mispellings.
本质上,您正在尝试根据一些经验法则开发一个文本字符串列表,例如单词中缺少一个字母、一个字母被放错位置、一个字母输入错误等。问题是在用 SAS 或任何其他语言编写代码之前,必须明确定义这些规则(这就是 Chris 所指的)。如果您的要求减少到这种只有一个字母错误的情况,那么这可能是可以管理的;否则,评论者是正确的,您可以轻松创建大量错误拼写列表(毕竟,除“失业”之外的所有组合都构成该单词的拼写错误)。
话虽如此,SAS 中有很多方法可以完成这种文本操作(rx 函数、其他文本字符串函数的某种组合、宏);然而,可能有更好的方法来实现这一点。我建议使用外部 Perl 进程来生成可以读入 SAS 的文本文件,但其他程序员可能有更好的选择。
Essentially you are trying to develop a list of text strings based on some rule of thumb, such as one letter is missing from the word, that a letter is misplaced into the wrong spot, that one letter was mistyped, etc. The problem is that these rules have to be explicitly defined before you can write the code, in SAS or any other language (this is what Chris was referring to). If your requirement is reduced to this one-wrong-letter scenario then this might be managable; otherwise, the commenters are correct and you can easily create massive lists of incorrect spellings (after all, all combinations except "unemployment" constitute a misspelling of that word).
Having said that, there are many ways in SAS to accomplish this text manipulation (rx functions, some combination of other text-string functions, macros); however, there are probably better ways to accomplish this. I would suggest an external Perl process to generate a text file that can be read into SAS, but other programmers might have better alternatives.
如果您正在寻找通用的拼写检查器,SAS 确实有
procpell
。需要进行一些调整才能使其适合您的情况;它非常旧而且笨重。在这种情况下效果不佳,但是如果您尝试使用其他字典,可能会得到更好的结果?谷歌搜索会显示其他示例。
If you are looking for a general spell checker, SAS does have
proc spell
.It will take some tweaking to get it working for your situation; it's very old and clunky. It doesn't work well in this case, but you may have better results if you try and use another dictionary? A Google search will show other examples.