如何用 R 重写此 Stata 代码?
Stata 做得很好的事情之一是它构造新变量的方式(参见下面的示例)。如何在 R 中做到这一点?
foreach i in A B C D {
forval n=1990/2000 {
local m = 'n'-1
# create new columns from existing ones on-the-fly
generate pop'i''n' = pop'i''m' * (1 + trend'n')
}
}
One of the things Stata does well is the way it constructs new variables (see example below). How to do this in R?
foreach i in A B C D {
forval n=1990/2000 {
local m = 'n'-1
# create new columns from existing ones on-the-fly
generate pop'i''n' = pop'i''m' * (1 + trend'n')
}
}
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
不要在 R 中这样做。它之所以混乱是因为它的代码丑陋。用编程名称构造大量变量是一件坏事。名字就是名字。它们没有结构,所以不要试图将结构强加给它们。体面的编程语言有这样的结构 - 垃圾编程语言具有附加的“宏”功能,并最终导致这种通过将字符串粘贴在一起来构造变量名称的可怕模式。这是 20 世纪 70 年代的做法,现在应该已经消失了。不要成为编程恐龙。
例如,你如何知道你有多少个 popXXXX 变量?如何知道是否有 pop1990 到 pop2000 的完整序列?如果您想将变量保存到文件中以提供给某人该怎么办?恶心,恶心。
使用该语言为您提供的数据结构。在这种情况下可能是一个列表。
DONT do it in R. The reason its messy is because its UGLY code. Constructing lots of variables with programmatic names is a BAD THING. Names are names. They have no structure, so do not try to impose one on them. Decent programming languages have structures for this - rubbishy programming languages have tacked-on 'Macro' features and end up with this awful pattern of constructing variable names by pasting strings together. This is a practice from the 1970s that should have died out by now. Don't be a programming dinosaur.
For example, how do you know how many popXXXX variables you have? How do you know if you have a complete sequence of pop1990 to pop2000? What if you want to save the variables to a file to give to someone. Yuck, yuck yuck.
Use a data structure that the language gives you. In this case probably a list.
Spacedman 和 Joshua 的观点都很有道理。由于 Stata 在任何给定时间内存中只有一个数据集,因此我建议将变量添加到数据帧(这也是一种列表)而不是全局环境中(见下文)。
但老实说,更 R 式的方法是保留因子而不是变量名称。
我制作了一些数据,因为我相信它现在在您的 R 版本中(至少,我希望如此......)
然后您可以使用 stack() 函数来获取一个包含因子的数据帧
pop
和一个数字变量year
填充数据框非常容易:
在这种格式中,您会发现大多数 R 命令(某些年份的选择,单个群体的选择) ,对其中一个或两者的建模效果,...)以后执行起来要容易得多。
如果您坚持,您仍然可以使用
unstack()
改编 Joshua 的答案来创建宽格式,将列添加到数据帧中:
Both Spacedman and Joshua have very valid points. As Stata has only one dataset in memory at any given time, I'd suggest to add the variables to a dataframe (which is also a kind of list) instead of to the global environment (see below).
But honestly, the more R-ish way to do so, is to keep your factors factors instead of variable names.
I make some data as I believe it is in your R version now (at least, I hope so...)
You can then use the
stack()
function to obtain a dataframe where you have a factorpop
and a numeric variableyear
Filling up the dataframe is then quite easy :
In this format, you'll find most R commands (selections of some years, of a single population, modelling effects of either or both, ...) a whole lot easier to perform later on.
And if you insist, you can still create a wide format with
unstack()
Adaptation of Joshua's answer to add the columns to the dataframe :
假设您的全局环境中已存在
popA1989
、popB1989
、popC1989
、popD1989
,则下面的代码应该可以工作。当然还有更多“类似 R”的方法可以做到这一点,但我想为您提供类似于 Stata 代码的内容。Assuming
popA1989
,popB1989
,popC1989
,popD1989
already exist in your global environment, the code below should work. There are certainly more "R-like" ways to do this, but I wanted to give you something similar to your Stata code.假设您在向量
pop1989
中有人口数据以及
trend
中的趋势数据。Assuming you have population data in vector
pop1989
and data for trend in
trend
.