sas 生成所有可能的拼写错误

发布于 2024-11-01 09:35:12 字数 61 浏览 0 评论 0原文

有谁知道如何生成可能的拼写错误?

示例:失业 - 失业 - 一个就业网 - ETC。

Does any one know how to generate the possible misspelling ?

Example : unemployment
- uemployment
- onemploymnet
-- etc.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

兔小萌 2024-11-08 09:35:12

如果您只想生成可能的拼写错误列表,您可以尝试使用这样的工具< /a>.否则,在 SAS 中,您可以使用类似 COMPGED 来计算某人输入的字符串与您希望他们输入的字符串之间的相似度。如果按照您的标准,两者“足够接近”,请将其文本替换为您想要的文本。

下面是一个计算“unemployment”和各种可能的拼写错误之间的广义编辑距离的示例。

data misspell;
  input misspell $16.;
  length misspell string $16.;
  retain string "unemployment";
  GED=compged(misspell, string,'iL');
datalines;
nemployment
uemployment
unmployment
uneployment
unemloyment
unempoyment
unemplyment
unemploment
unemployent
unemploymnt
unemploymet
unemploymen
unemploymenyt
unemploymenty
unemploymenht
unemploymenth
unemploymengt
unemploymentg
unemploymenft
unemploymentf
blahblah
;
proc print data=misspell label;
   label GED='Generalized Edit Distance';
   var misspell string GED;
run;

If you just want to generate a list of possible misspellings, you might try a tool like this one. Otherwise, in SAS you might be able to use a function like COMPGED to compute a measure of the similarity between the string someone entered, and the one you wanted them to type. If the two are "close enough" by your standard, replace their text with the one you wanted.

Here is an example that computes the Generalized Edit Distance between "unemployment" and a variety of plausible mispellings.

data misspell;
  input misspell $16.;
  length misspell string $16.;
  retain string "unemployment";
  GED=compged(misspell, string,'iL');
datalines;
nemployment
uemployment
unmployment
uneployment
unemloyment
unempoyment
unemplyment
unemploment
unemployent
unemploymnt
unemploymet
unemploymen
unemploymenyt
unemploymenty
unemploymenht
unemploymenth
unemploymengt
unemploymentg
unemploymenft
unemploymentf
blahblah
;
proc print data=misspell label;
   label GED='Generalized Edit Distance';
   var misspell string GED;
run;
疑心病 2024-11-08 09:35:12

本质上,您正在尝试根据一些经验法则开发一个文本字符串列表,例如单词中缺少一个字母、一个字母被放错位置、一个字母输入错误等。问题是在用 SAS 或任何其他语言编写代码之前,必须明确定义这些规则(这就是 Chris 所指的)。如果您的要求减少到这种只有一个字母错误的情况,那么这可能是可以管理的;否则,评论者是正确的,您可以轻松创建大量错误拼写列表(毕竟,除“失业”之外的所有组合都构成该单词的拼写错误)。

话虽如此,SAS 中有很多方法可以完成这种文本操作(rx 函数、其他文本字符串函数的某种组合、宏);然而,可能有更好的方法来实现这一点。我建议使用外部 Perl 进程来生成可以读入 SAS 的文本文件,但其他程序员可能有更好的选择。

Essentially you are trying to develop a list of text strings based on some rule of thumb, such as one letter is missing from the word, that a letter is misplaced into the wrong spot, that one letter was mistyped, etc. The problem is that these rules have to be explicitly defined before you can write the code, in SAS or any other language (this is what Chris was referring to). If your requirement is reduced to this one-wrong-letter scenario then this might be managable; otherwise, the commenters are correct and you can easily create massive lists of incorrect spellings (after all, all combinations except "unemployment" constitute a misspelling of that word).

Having said that, there are many ways in SAS to accomplish this text manipulation (rx functions, some combination of other text-string functions, macros); however, there are probably better ways to accomplish this. I would suggest an external Perl process to generate a text file that can be read into SAS, but other programmers might have better alternatives.

§普罗旺斯的薰衣草 2024-11-08 09:35:12

如果您正在寻找通用的拼写检查器,SAS 确实有 procpell

需要进行一些调整才能使其适合您的情况;它非常旧而且笨重。在这种情况下效果不佳,但是如果您尝试使用其他字典,可能会得到更好的结果?谷歌搜索会显示其他示例。

filename name temp lrecl=256;
options caps;

data _null_;
  file name;
  informat name $256.;
  input name &;
  put name;
  cards;
uemployment 
onemploymnet 
;

proc spell in=name
  dictionary=SASHELP.BASE.NAMES
  suggest;
run;

options nocaps;

If you are looking for a general spell checker, SAS does have proc spell.

It will take some tweaking to get it working for your situation; it's very old and clunky. It doesn't work well in this case, but you may have better results if you try and use another dictionary? A Google search will show other examples.

filename name temp lrecl=256;
options caps;

data _null_;
  file name;
  informat name $256.;
  input name &;
  put name;
  cards;
uemployment 
onemploymnet 
;

proc spell in=name
  dictionary=SASHELP.BASE.NAMES
  suggest;
run;

options nocaps;
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文