如何用 R 重写此 Stata 代码?

发布于 2024-10-18 02:15:14 字数 286 浏览 8 评论 0原文

Stata 做得很好的事情之一是它构造新变量的方式(参见下面的示例)。如何在 R 中做到这一点?

foreach i in A B C D {  
    forval n=1990/2000 {  
       local m = 'n'-1  
       # create new columns from existing ones on-the-fly  
       generate pop'i''n' = pop'i''m' * (1 + trend'n')  
   }  
}  

One of the things Stata does well is the way it constructs new variables (see example below). How to do this in R?

foreach i in A B C D {  
    forval n=1990/2000 {  
       local m = 'n'-1  
       # create new columns from existing ones on-the-fly  
       generate pop'i''n' = pop'i''m' * (1 + trend'n')  
   }  
}  

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

蓝戈者 2024-10-25 02:15:14

不要在 R 中这样做。它之所以混乱是因为它的代码丑陋。用编程名称构造大量变量是一件坏事。名字就是名字。它们没有结构,所以不要试图将结构强加给它们。体面的编程语言有这样的结构 - 垃圾编程语言具有附加的“宏”功能,并最终导致这种通过将字符串粘贴在一起来构造变量名称的可怕模式。这是 20 世纪 70 年代的做法,现在应该已经消失了。不要成为编程恐龙。

例如,你如何知道你有多少个 popXXXX 变量?如何知道是否有 pop1990 到 pop2000 的完整序列?如果您想将变量保存到文件中以提供给某人该怎么办?恶心,恶心。

使用该语言为您提供的数据结构。在这种情况下可能是一个列表。

DONT do it in R. The reason its messy is because its UGLY code. Constructing lots of variables with programmatic names is a BAD THING. Names are names. They have no structure, so do not try to impose one on them. Decent programming languages have structures for this - rubbishy programming languages have tacked-on 'Macro' features and end up with this awful pattern of constructing variable names by pasting strings together. This is a practice from the 1970s that should have died out by now. Don't be a programming dinosaur.

For example, how do you know how many popXXXX variables you have? How do you know if you have a complete sequence of pop1990 to pop2000? What if you want to save the variables to a file to give to someone. Yuck, yuck yuck.

Use a data structure that the language gives you. In this case probably a list.

韬韬不绝 2024-10-25 02:15:14

Spacedman 和 Joshua 的观点都很有道理。由于 Stata 在任何给定时间内存中只有一个数据集,因此我建议将变量添加到数据帧(这也是一种列表)而不是全局环境中(见下文)。

但老实说,更 R 式的方法是保留因子而不是变量名称。

我制作了一些数据,因为我相信它现在在您的 R 版本中(至少,我希望如此......)

Data <- data.frame(
    popA1989 = 1:10,
    popB1989 = 10:1,
    popC1989 = 11:20,
    popD1989 = 20:11
)

Trend <- replicate(11,runif(10,-0.1,0.1))

然后您可以使用 stack() 函数来获取一个包含因子的数据帧pop 和一个数字变量 year

newData <- stack(Data)
newData$pop <- substr(newData$ind,4,4)
newData$year <- as.numeric(substr(newData$ind,5,8))
newData$ind <- NULL

填充数据框非常容易:

for(i in 1:11){

  tmp <- newData[newData$year==(1988+i),]
  newData <- rbind(newData,
      data.frame( values = tmp$values*Trend[,i],
                  pop = tmp$pop,
                  year = tmp$year+1
      )
  )
}

在这种格式中,您会发现大多数 R 命令(某些年份的选择,单个群体的选择) ,对其中一个或两者的建模效果,...)以后执行起来要容易得多。

如果您坚持,您仍然可以使用 unstack() 改编 Joshua 的答案来创建宽格式

unstack(newData,values~paste("pop",pop,year,sep=""))

,将列添加到数据帧中:

for(L in LETTERS[1:4]) {
  for(i in 1990:2000) {
    new <- paste("pop",L,i,sep="")  # create name for new variable
    old <- get(paste("pop",L,i-1,sep=""),Data)  # get old variable
    trend <- Trend[,i-1989]  # get trend variable
    Data <- within(Data,assign(new, old*(1+trend)))
  }
}

Both Spacedman and Joshua have very valid points. As Stata has only one dataset in memory at any given time, I'd suggest to add the variables to a dataframe (which is also a kind of list) instead of to the global environment (see below).

But honestly, the more R-ish way to do so, is to keep your factors factors instead of variable names.

I make some data as I believe it is in your R version now (at least, I hope so...)

Data <- data.frame(
    popA1989 = 1:10,
    popB1989 = 10:1,
    popC1989 = 11:20,
    popD1989 = 20:11
)

Trend <- replicate(11,runif(10,-0.1,0.1))

You can then use the stack() function to obtain a dataframe where you have a factor pop and a numeric variable year

newData <- stack(Data)
newData$pop <- substr(newData$ind,4,4)
newData$year <- as.numeric(substr(newData$ind,5,8))
newData$ind <- NULL

Filling up the dataframe is then quite easy :

for(i in 1:11){

  tmp <- newData[newData$year==(1988+i),]
  newData <- rbind(newData,
      data.frame( values = tmp$values*Trend[,i],
                  pop = tmp$pop,
                  year = tmp$year+1
      )
  )
}

In this format, you'll find most R commands (selections of some years, of a single population, modelling effects of either or both, ...) a whole lot easier to perform later on.

And if you insist, you can still create a wide format with unstack()

unstack(newData,values~paste("pop",pop,year,sep=""))

Adaptation of Joshua's answer to add the columns to the dataframe :

for(L in LETTERS[1:4]) {
  for(i in 1990:2000) {
    new <- paste("pop",L,i,sep="")  # create name for new variable
    old <- get(paste("pop",L,i-1,sep=""),Data)  # get old variable
    trend <- Trend[,i-1989]  # get trend variable
    Data <- within(Data,assign(new, old*(1+trend)))
  }
}
不打扰别人 2024-10-25 02:15:14

假设您的全局环境中已存在 popA1989popB1989popC1989popD1989,则下面的代码应该可以工作。当然还有更多“类似 R”的方法可以做到这一点,但我想为您提供类似于 Stata 代码的内容。

for(L in LETTERS[1:4]) {
  for(i in 1990:2000) {
    new <- paste("pop",L,i,sep="")  # create name for new variable
    old <- get(paste("pop",L,i-1,sep=""))  # get old variable
    trend <- get(paste("trend",i,sep=""))  # get trend variable
    assign(new, old*(1+trend))
  }
}

Assuming popA1989, popB1989, popC1989, popD1989 already exist in your global environment, the code below should work. There are certainly more "R-like" ways to do this, but I wanted to give you something similar to your Stata code.

for(L in LETTERS[1:4]) {
  for(i in 1990:2000) {
    new <- paste("pop",L,i,sep="")  # create name for new variable
    old <- get(paste("pop",L,i-1,sep=""))  # get old variable
    trend <- get(paste("trend",i,sep=""))  # get trend variable
    assign(new, old*(1+trend))
  }
}
指尖凝香 2024-10-25 02:15:14

假设您在向量 pop1989 中有人口数据
以及trend中的趋势数据。

require(stringr)# because str_c has better default for sep parameter
dta <- kronecker(pop1989,cumprod(1+trend))
names(dta) <- kronecker(str_c("pop",LETTERS[1:4]),1990:2000,str_c)

Assuming you have population data in vector pop1989
and data for trend in trend.

require(stringr)# because str_c has better default for sep parameter
dta <- kronecker(pop1989,cumprod(1+trend))
names(dta) <- kronecker(str_c("pop",LETTERS[1:4]),1990:2000,str_c)
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文