按数字范围合并

发布于 2024-11-08 13:46:24 字数 467 浏览 3 评论 0原文

我想根据范围而不是明确的数字将一个人的名字分配给一个数字。可以使用格式来完成此操作,但由于我在数据集中有名称,所以我希望避免手动编写 proc 格式

data names;                      
   input low high name $;  
   datalines;
1   10  John
11  20  Paul
21  30  George
31  40  Ringo
;

data numbers;
    input number;
    datalines;
33
21
17
5
;

所需的输出是:

data output;
    input number name $;
    datalines;
33  Ringo
21  George
17  Paul
5   John
;

感谢您的帮助。

I'd like to assign a person's name to a number based on a range rather than an explicit number. It's possible to do this using formats, but as I have the names in a dataset I'd prefer to avoid the manual process of writing the proc format.

data names;                      
   input low high name $;  
   datalines;
1   10  John
11  20  Paul
21  30  George
31  40  Ringo
;

data numbers;
    input number;
    datalines;
33
21
17
5
;

The desired output is:

data output;
    input number name $;
    datalines;
33  Ringo
21  George
17  Paul
5   John
;

Thanks for any help.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

天涯离梦残月幽梦 2024-11-15 13:46:24

您可以使用 PROC SQL 执行此操作:

proc sql;
 create table output as
 select numbers.number, names.name
 from numbers left join names
  on numbers.number ge names.low
  and numbers.number le names.high
;
quit;

You can do it like this using PROC SQL:

proc sql;
 create table output as
 select numbers.number, names.name
 from numbers left join names
  on numbers.number ge names.low
  and numbers.number le names.high
;
quit;
关于从前 2024-11-15 13:46:24

proc format 的一项便利功能是能够使用数据集来创建格式,而不是手动输入。您的场景似乎是此功能的完美场景。

在您给出的示例中,对“名称”数据集进行一些小更改会将其置于可以通过 proc 格式读取的形式。

例如,如果我像这样修改名称数据集......

data names;
   retain fmtname "names" type "N";
   input start end label $; 
   datalines;
1   10  John
11  20  Paul
21  30  George
31  40  Ringo
;

然后我可以发出此命令来基于它构建格式。

proc format cntlin=names;run;

现在我可以使用这种格式,就像使用任何其他格式一样。例如,要创建一个包含基于数字的所需“名称”的新列,您可以执行以下操作:

data numbers;
    input number;
    number_formatted=put(number,names.);
    datalines;
33
21
17
5
;

输出如下:

             number_
  number    formatted

    33       Ringo
    21       George
    17       Paul
     5       John

更新以解决问题:

没有从文本文件中读取所需的编码有很大差异。我们只需对其进行设置,以便输出数据集具有 proc 格式所需的特定变量名称(fmtname、type、start、end 和 label)。

例如,如果我有一个名为“names.csv”的外部逗号分隔文件,如下所示:

1,10,John
11,20,Paul
21,30,George
31,40,Ringo

然后我只需更改创建“names”数据集的代码,使其如下所示:

data names;
   retain fmtname "names" type "N";
   infile "<path to file>/names.csv" dsd;
   input start end label $;
run;

现在我可以运行 proc像我之前一样使用 cntlin 选项格式化:

proc format cntlin=names;run;

One handy feature of proc format is the ability to use a data set to create the format, instead of typing it in by hand. Your scenario seems like a perfect scenario for this feature.

In the example you give, a few small changes to the "names" data set will put it in a form that can be read by proc format.

For example, if I modify the names data set like so..

data names;
   retain fmtname "names" type "N";
   input start end label $; 
   datalines;
1   10  John
11  20  Paul
21  30  George
31  40  Ringo
;

I can then issue this command to build the format based on it.

proc format cntlin=names;run;

Now I can use this format just like you would with any other format. For example, to create a new column that contains the desired "name" based on the number, you could do this:

data numbers;
    input number;
    number_formatted=put(number,names.);
    datalines;
33
21
17
5
;

Here is what the output would look like:

             number_
  number    formatted

    33       Ringo
    21       George
    17       Paul
     5       John

Update to address question:

There isn't much difference in coding needed to read from a text file. We just need to set it up so that the output data set has the particular variable names that proc format expects (fmtname, type, start, end , and label).

For example, if I have an external comma-seperated file called "names.csv" that looks like this:

1,10,John
11,20,Paul
21,30,George
31,40,Ringo

Then I simply can change the code that creates the "names" data set so that it looks like this:

data names;
   retain fmtname "names" type "N";
   infile "<path to file>/names.csv" dsd;
   input start end label $;
run;

Now I can run proc format with the cntlin option like I did before:

proc format cntlin=names;run;
新雨望断虹 2024-11-15 13:46:24

我认为 SQL 确实更简洁,但如果您不是它的忠实粉丝并且数字以已知的增量出现,您可以尝试类似的操作:

data ranges;
    set names;
    do number = low to high; /* by ... */
        output;
    end;
proc sort;
    by number;
run;

data output;
    merge ranges
        numbers ( in = innum )
    ;
    by number;
    keep number name;

    if innum;
run;

再次,它要求数字以预定的增量出现,例如整数。

I think SQL is more succinct indeed, but if you aren't big fan of it and the numbers come in known increments, you may try something like:

data ranges;
    set names;
    do number = low to high; /* by ... */
        output;
    end;
proc sort;
    by number;
run;

data output;
    merge ranges
        numbers ( in = innum )
    ;
    by number;
    keep number name;

    if innum;
run;

Again, it requires numbers to come in predetermined increments, e.g. integers.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文