当前位置：文江博客话题详情

如何使用SAS从姓氏（如果有）删除世代后缀？

发布于 2025-01-22 08:55:51 字数 1312 浏览 3 评论 0原文

在我的数据集中，姓氏（ lname ）有时会附上世代后缀。关于世代的后缀：

变量和后缀之间没有空格或其他可能的定系数
。
lname
后缀有时包括

我试图首先考虑简单解决方案的整数和字符的组合。我想不出任何使用Excel，因为它们的所有字符串解决方案都需要具有要删除的值的一致位置。

在SAS中， parse 需要一个定界符，并且 trim 需要一致的位置。

在我所附的语法中，我尝试了四种不同的方法。他们都没有成功，我完全承认用户错误。除了 compress ，我不熟悉其中任何一个，然后仅用于删除空白。

有没有办法为没有世代后缀的姓氏做一个新变量？

太感谢了！

第一件作品适用于我的每一次尝试。

data want;
    input id lname $ fname $;
    datalines;
        123456  Smith       John
        234567  SMITH       ANDREW
        345678  SmithJr     Alan
        456789  SMITHSR     SAM
        789012  smithiii    robert
        890123  smithIIII   william
        901234  Smith4th    Tim
        ;
run;

我的尝试从这里开始。

/* COMPRESS */
data want;
    set have;
    lname2 = compress(lname,'Jr');
    put string=;
run;

/* TRANWARD */
data want;
    set have;
    lname2 = tranwrd(lname,"Jr", "");
    lname2 = tranwrd(lname,"Sr", "");
    lname2 = tranwrd(lname,"III", "");
run;

/* PRXCHANGE */
data want;
    set have;
    lname2 = lname;
    lname2 = prxchange('s/(.*)(jr|sr|iii|iv)$/$1/i',1,trim(lname));
run;

/* PRXMATCH */
data want;
    set have;
    if prxmatch('/Jr|Sr|III/',lname) then lname2 = '';
run;

原文

In my dataset, the last name (lname) occasionally has the generational suffix attached. Regarding the generational suffix:

there are no spaces or other possible delimiters between the lname variable and the suffix
the suffix ranges between 2 and 4 characters in length
the suffix is a mix of lowercase, uppercase, and proper case
the suffix sometimes includes a combination of integers and characters

I tried to think simple solutions first. I couldn't think of any using Excel because all of their string solutions require having a consistent position of the values to be removed.

In SAS, PARSE requires a delimiter, and TRIM requires a consistent position.

In the syntax I've attached are four different approaches I tried. None of them were successful, and I totally admit user error. I'm not familiar with any of them other than COMPRESS, and then only for removing blanks.

Is there a way I can make a new variable for last name that doesn't have the generational suffix attached?

Thank you so much!

This first piece applies to each of the my attempts.

data want;
    input id lname $ fname $;
    datalines;
        123456  Smith       John
        234567  SMITH       ANDREW
        345678  SmithJr     Alan
        456789  SMITHSR     SAM
        789012  smithiii    robert
        890123  smithIIII   william
        901234  Smith4th    Tim
        ;
run;

My attempts start here.

/* COMPRESS */
data want;
    set have;
    lname2 = compress(lname,'Jr');
    put string=;
run;

/* TRANWARD */
data want;
    set have;
    lname2 = tranwrd(lname,"Jr", "");
    lname2 = tranwrd(lname,"Sr", "");
    lname2 = tranwrd(lname,"III", "");
run;

/* PRXCHANGE */
data want;
    set have;
    lname2 = lname;
    lname2 = prxchange('s/(.*)(jr|sr|iii|iv)$/$1/i',1,trim(lname));
run;

/* PRXMATCH */
data want;
    set have;
    if prxmatch('/Jr|Sr|III/',lname) then lname2 = '';
run;

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

感情洁癖 2025-01-29 08:55:51

您根本无法将Compress（）用于此目的而
不是TranWrd（需要一个定界符），您可能会尝试使用翻译。但是，您不会解决在单词的开头或中间替换图案的问题
。

    data have;
        input id lname $ fname $;
        datalines;
            123456  Smith       John
            234567  SMITH       ANDREW
            345678  SmithJr     Alan
            456789  SMITHSR     SAM
            789012  smithiii    robert
            890123  smithIIII   william
            901234  Smith4th    Tim
            901235  SRith4th    Tim
            ;
    run;
         
    data want;
        set have;
    
          /* Use PRXPARSE to compile the Perl regular expression.    */
       patternID=prxparse('/(JR$)|(SR$)|(III$)/');
          /* Use PRXMATCH to find the position of the pattern match. */
       position=prxmatch(patternID, compress(upcase(lname)));
       put position=;
       if position then do;
         put lname=;
         lname2 = '';
       end;
    run;

You can not use compress() for this purpose at all
Instead of tranwrd (it requires a delimiter) you might try to use translate. But you will not solve the problem of replacing your pattern in the beginning or midle of the word
The example of prxmatch is below.

    data have;
        input id lname $ fname $;
        datalines;
            123456  Smith       John
            234567  SMITH       ANDREW
            345678  SmithJr     Alan
            456789  SMITHSR     SAM
            789012  smithiii    robert
            890123  smithIIII   william
            901234  Smith4th    Tim
            901235  SRith4th    Tim
            ;
    run;
         
    data want;
        set have;
    
          /* Use PRXPARSE to compile the Perl regular expression.    */
       patternID=prxparse('/(JR$)|(SR$)|(III$)/');
          /* Use PRXMATCH to find the position of the pattern match. */
       position=prxmatch(patternID, compress(upcase(lname)));
       put position=;
       if position then do;
         put lname=;
         lname2 = '';
       end;
    run;

回复收藏 0 原文

墨落画卷 2025-01-29 08:55:51

我认为您对您的prxchange方法很好，对我来说，这是最可靠和易于维护的方法，我只会更改2件事：

我们的“ O”修饰符只能编译后，一旦正则
施以使用条款而不是修剪器（strip相当于ltrim + rtrim）

data want;
    set have;
    attrib lname2 format=$50.;
    lname2 = prxchange('s/(.*)(jr|sr|iii|iv)$/$1/oi', 1, strip(lname));
run;

I think you're fine with your prxchange method, for me it's the most reliable and easy to maintain, I would just change 2 things:

us the 'o' modifier to compile only once the regex
Use a strip instead of a trim (strip is an equivalent of ltrim + rtrim)

data want;
    set have;
    attrib lname2 format=$50.;
    lname2 = prxchange('s/(.*)(jr|sr|iii|iv)$/$1/oi', 1, strip(lname));
run;

回复收藏 0 原文

~没有更多了~

关于作者

怪我闹别瞎闹

暂无简介

文章

29 人气

关注发私信

友情链接

文江博客

如何使用SAS从姓氏（如果有）删除世代后缀？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

alipaysp_snBf0MSZIv

梦断已成空

瞎闹

凯凯我们等你回来

寄意

似梦非梦

友情链接

如何使用SAS从姓氏（如果有）删除世代后缀？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

alipaysp_snBf0MSZIv

梦断已成空

瞎闹

凯凯我们等你回来

寄意

似梦非梦

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。