列中SAS中的类似字符串组

发布于 2025-02-12 22:41:38 字数 273 浏览 1 评论 0原文

我有下表:

Name
----
John Smith
John Smth
Jane Lee
Jane Line
Timothy Brown
Timmothy Brown
Agnes James
Aaron James

使用SAS,如何大规模分组这些字符串以识别那些相似的字符串,以便我可以获得此表:

Name
----
John Smith
John Smth
Timothy Brown
Timmothy Brown

I have the following table:

Name
----
John Smith
John Smth
Jane Lee
Jane Line
Timothy Brown
Timmothy Brown
Agnes James
Aaron James

Using SAS, how can I group these strings on a large scale to identify those that are similar, so that I can get this table:

Name
----
John Smith
John Smth
Timothy Brown
Timmothy Brown

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

ㄟ。诗瑗 2025-02-19 22:41:38

SAS中有很多方法可以进行字符串的比较。一个简单的示例是使用soundex找到两个类似声音的字符串。

data have;
input Name $char20.;
datalines;
John Smith
John Smth
Jane Lee
Jane Line
Timothy Brown
Timmothy Brown
Agnes James
Aaron James
;

proc sql;
  create table want as
  select 
    A.name
  , B.name as name2
  , soundex(A.name) as sxname
  , soundex(B.name) as sxname2
  from have a
  cross join 
  have b
  where a.name lt b.name
  having sxname = sxname2
  ;

其他技术将使用基于Levenshtein编辑距离等度量的匹配标准,该距离可以使用complev计算。您还可以了解有关spedis的更多信息。

搜索如何使用SAS功能进行模糊匹配,您将获得很多咀嚼。请留意查尔斯·帕特里奇(Charles Patridge)的论文

There are many ways in SAS to perform comparisons of strings. A simple example is using SOUNDEX to find two strings that sound alike.

data have;
input Name $char20.;
datalines;
John Smith
John Smth
Jane Lee
Jane Line
Timothy Brown
Timmothy Brown
Agnes James
Aaron James
;

proc sql;
  create table want as
  select 
    A.name
  , B.name as name2
  , soundex(A.name) as sxname
  , soundex(B.name) as sxname2
  from have a
  cross join 
  have b
  where a.name lt b.name
  having sxname = sxname2
  ;

Other techniques would use a matching criterion based on a metric such as Levenshtein edit distance, which can be computed with COMPLEV. You can also learn more about SPEDIS.

Searching up How to perform a fuzzy match using SAS functions and you will get plenty to chew on. Keep an eye out for papers by Charles Patridge

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文