如何在 Stata 中对组内的观察结果进行排名?
我在 Stata 中有一些数据,看起来像前两列:
group_id var_to_rank desired_rank
____________________________________
1 10 1
1 20 2
1 30 3
1 40 4
2 10 1
2 20 2
2 20 2
2 30 3
我想根据一个变量 (var_to_rank) 创建组 (group_id) 内每个观察值的排名。通常,出于此目的,我使用:
gen id = _n
但是我的一些观察结果(在我的小示例中 group_id = 2)具有相同的排名变量值,并且这种方法不起作用。
我还尝试使用:
egen rank
具有不同选项的命令,但无法使我的排名变量看起来像desired_rank。
你能给我指出这个问题的解决方案吗?
I have some data in Stata which look like the first two columns of:
group_id var_to_rank desired_rank
____________________________________
1 10 1
1 20 2
1 30 3
1 40 4
2 10 1
2 20 2
2 20 2
2 30 3
I'd like to create a rank of each observation within group (group_id) according to one variable (var_to_rank). Usually, for this purpose I used:
gen id = _n
However some of my observations (group_id = 2 in my small example) have the same values of ranking variable and this approach doesn't work.
I have also tried using:
egen rank
command with different options, but cannot make my rank variables make to look like desired_rank.
Could you point me to a solution to this problem?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
以下对我有用:
The following works for me:
我想说这个问题是为了最好的理解而以错误的方式提出的。目的是将观察值分组,将具有最低值的值全部分配为 1 级,次低值的值全部分配为 2 级,依此类推。这并不是我所讨论过的大多数意义上的排名,但 Stata 的
egen,rank()
确实可以帮助您解决问题。但是,在该线程的其他地方引用的统计主义者线程中提到的直接方式(从这里开始)在精神上比引用的任何解决方案都简单:
一旦数据按
var_to_rank
排序,那么当值与每个块开头的先前值不同时不同值 值 1 是var_to_rank != var_to_rank[_n-1]
的结果;否则结果为 0。将这些 1 和 0 累加求和即可得到所需的变量。前缀命令bysort
执行所需的排序,并确保这一切都是在group_id
定义的组内单独完成的。根本不需要egen
(许多只偶尔使用 Stata 的人常常觉得这个命令很奇怪)。利益声明:引用的集权主义线程表明,当被问到类似的问题时,我也没有想到这个解决方案。
I'd say this question is posed the wrong way round for best understanding. The aim is to group observations, those with the lowest value all being assigned a grade 1, the next lowest being all assigned 2 and so forth. This isn't ranking in most senses that I have seen discussed, but Stata's
egen, rank()
does get you part of the way.But the direct way, which was mentioned in the Statalist thread cited elewhere in this thread (start here) is simpler in spirit than any solution quoted:
Once data are sorted on
var_to_rank
then when values differ from previous values at the start of each block of distinct values a value of 1 is the result ofvar_to_rank != var_to_rank[_n-1]
; otherwise 0 is the result. Summing those 1s and 0s cumulatively gives the desired variable. The prefix commandbysort
does the sorting required and ensures that this is all done separately within the groups defined bygroup_id
. No need foregen
at all (a command that many people who only use Stata occasionally often find bizarre).Declaration of interest: The Statalist thread cited shows that when asked a similar question I too did not think of this solution in one.
在 Statalist 上偶然发现了这样的解决方案:
似乎可以解决这个问题。
Stumbled upon such solution on the Statalist:
Seems to sort out this issue.
@radek:你肯定同时解决了它......但这将是一个简单的(虽然不是很优雅)的解决方案:
@radek: you surely got it sorted out in the meantime ... but this would have been an easy (though not very elegant) solution:
工作量太大了。轻松又优雅。试试这个。
gendesired_rank=int(var_to_rank/10)
Way too much work. Easy and elegant. Try this one.
gen desired_rank=int(var_to_rank/10)
尝试这个命令,它对我来说非常有效:
egen newid=group(oldid)
try this command, it works for me so well:
egen newid=group(oldid)