在 R 中使用 Data.frames（使用 SAS 代码来描述我想要的）r

发布于 2024-08-07 15:35:02 字数 1135 浏览 9 评论 0原文

我最近主要在 SAS 工作，但不想失去对 RI 的熟悉程度，我想复制一些我做过的基本工作。如果我的 SAS 代码不完美，请原谅我，因为我家里没有 SAS，所以我凭记忆这样做。

在 SAS 中，我有一个数据集，大致类似于以下示例（. 相当于 SAS 中的 NA）

如果上面的数据集是 work.foo 那么我可以执行如下操作。

/* create work.bar from dataset work.foo */
data work.bar;
set work.foo;

/* generate a third variable and add it to work.bar */
if a = 0 and b ge 1 then c = 1;
if a = 0 and b = 0  then c = 2;
if a = 1 and b ge 1 then c = 3;
if a = 1 and b = 0  then c = 4;
run;

我会得到类似的结果

然后我可以按 C 进行排序，然后使用 C 执行各种操作来创建 4 个子组。例如，我可以获取每个组的平均值，

proc means noprint data =work.bar; 
by c;
var a b;
output out = work.means mean(a b) = a b;
run;

并且可以按名为 work.means 的组获取变量数据比如：

我想我也可能会得到一个。行，但出于我的目的，我并不关心这一点。

现在在 R 中。我有相同的数据集，已正确读取，但我不知道如何在末尾添加变量（如 CC）或如何对子组执行操作（如 proc 中的 by cc 命令）方法）。另外，我应该注意，我的变量不是按任何顺序命名的，而是根据它们所代表的内容命名的。

我想如果有人可以告诉我如何做到以上，我就可以将其概括为我需要做的事情。

原文

I've been mostly working in SAS of late, but not wanting to lose what familiarity with R I have, I'd like to replicate something basic I've done. You'll forgive me if my SAS code isn't perfect, I'm doing this from memory since I don't have SAS at home.

In SAS I have a dataset that roughly is like the following example (. is equivalent of NA in SAS)

If the dataset above was work.foo then I could do something like the following.

/* create work.bar from dataset work.foo */
data work.bar;
set work.foo;

/* generate a third variable and add it to work.bar */
if a = 0 and b ge 1 then c = 1;
if a = 0 and b = 0  then c = 2;
if a = 1 and b ge 1 then c = 3;
if a = 1 and b = 0  then c = 4;
run;

and I'd get something like

And I could then proc sort by C and then perform various operations using C to create 4 subgroups. For example I could get the means of each group with

proc means noprint data =work.bar; 
by c;
var a b;
output out = work.means mean(a b) = a b;
run;

and I'd get a data of variables by groups called work.means
something like:

I think I may also get a . row, but I don't care about that for my purposes.

Now in R. I have the same data set that's been read in properly, but I have no idea how to add a variable to the end (like CC) or how to do an operation on a subgroup (like the by cc command in proc means). Also, I should note that my variables aren't named in any sort of order, but according to what they represent.

I figure if somebody can show me how to do the above, I can generalize it to what I need to do.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

演出会有结束 2024-08-14 15:35:02

假设您的数据集是一个名为 work.foo 的两列数据框，其中包含变量 a 和 b。那么下面的代码是在 R 中执行此操作的一种方法：

work.bar <- work.foo
work.bar$c <- with( (a==0 & b>=1) + 2*(a==0 & b==0) + 3*(a==1 & b>=1) + 
               4*(a==1 & b==0), data=work.foo)
work.mean <- by(work.bar[,1:2], work.bar$c, mean)

Assume your data set is a two-column dataframe called work.foo with variables a and b. Then the following code is one way to do it in R:

work.bar <- work.foo
work.bar$c <- with( (a==0 & b>=1) + 2*(a==0 & b==0) + 3*(a==1 & b>=1) + 
               4*(a==1 & b==0), data=work.foo)
work.mean <- by(work.bar[,1:2], work.bar$c, mean)

回复收藏 0 原文

负佳期 2024-08-14 15:35:02

另一种方法是使用 plyr 包中的 ddply() ，您甚至不必创建组变量（尽管这非常方便）。

ddply(work.foo, c("a", "b"), function(x) c(mean(x$a, na.rm = TRUE), mean(x$b, na.rm = TRUE))

当然，如果您有分组变量，只需将 c("a", "b") 替换为 "c" 即可。

在我看来，主要优点是 plyr 函数将返回您喜欢的任何类型的对象 - ddply 获取一个数据帧并返回一个数据帧，dlply 将返回一个列表等。 by( ) 及其 *apply 兄弟通常只给你一个列表。我认为。

An alternative is to use ddply() from the plyr package - you wouldn't even have to create a group variable, necessarily (although that's awfully convenient).

ddply(work.foo, c("a", "b"), function(x) c(mean(x$a, na.rm = TRUE), mean(x$b, na.rm = TRUE))

Of course, if you had the grouping variable, you'd just replace c("a", "b") with "c".

The main advantage in my mind is that plyr functions will return whatever kind of object you like - ddply takes a data frame and gives you one back, dlply would return a list, etc. by() and its *apply brethren usually just give you a list. I think.

回复收藏 0 原文

~没有更多了~