如何使用 Rrank() 函数创建新的ties.method？

发布于 2024-09-09 01:00:23 字数 1463 浏览 6 评论 0原文

我试图按人口和日期对这个数据框进行排序，因此我使用 order() 和 rank() 函数：

> df <- data.frame(idgeoville = c(5, 8, 4, 3, 4, 5, 8, 8),
                   date       = c(rep(1950, 4), rep(2000, 4)),
                   population = c(500, 450, 350, 350, 650, 500, 500, 450))
> df
   idgeoville date    population
1  5          1950     500
2  8          1950     450
3  4          1950     350
4  3          1950     350
5  4          2000     650
6  5          2000     500
7  8          2000     500
8  8          2000     450

With ties.method = “first” 我没有问题，最后我生成了这个数据框：

   idgeoville date    population  rank
1  5          1950     500        1
2  8          1950     450        2
3  4          1950     350        3
4  3          1950     350        4
5  4          2000     650        1
6  5          2000     500        2
7  8          2000     500        3
8  8          2000     450        4

但事实上，我想要一个具有同等人口排名的同等排名的数据框，如下所示：

   idgeoville date    population  rank
1  5          1950     500        1
2  8          1950     450        2
3  4          1950     350        3
4  3          1950     350        3
5  4          2000     650        1
6  5          2000     500        2
7  8          2000     500        2
8  8          2000     450        3

我该如何解决R有这个问题吗？使用自定义 ties.method() 或其他 R 技巧？

原文

I'm trying to order this dataframe by population and date, so I'm using the order() and rank() functions:

> df <- data.frame(idgeoville = c(5, 8, 4, 3, 4, 5, 8, 8),
                   date       = c(rep(1950, 4), rep(2000, 4)),
                   population = c(500, 450, 350, 350, 650, 500, 500, 450))
> df
   idgeoville date    population
1  5          1950     500
2  8          1950     450
3  4          1950     350
4  3          1950     350
5  4          2000     650
6  5          2000     500
7  8          2000     500
8  8          2000     450

With ties.method = "first" I have no problem, finally I'm producing this dataframe:

   idgeoville date    population  rank
1  5          1950     500        1
2  8          1950     450        2
3  4          1950     350        3
4  3          1950     350        4
5  4          2000     650        1
6  5          2000     500        2
7  8          2000     500        3
8  8          2000     450        4

But in fact, I want a dataframe with equal ranking for equal population rank, like this:

   idgeoville date    population  rank
1  5          1950     500        1
2  8          1950     450        2
3  4          1950     350        3
4  3          1950     350        3
5  4          2000     650        1
6  5          2000     500        2
7  8          2000     500        2
8  8          2000     450        3

How can I resolve this problem with R? With a custom ties.method() or another R tricks?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

太傻旳人生 2024-09-16 01:00:23

更简单的方法：

pop.rank <- as.numeric(factor(population))

More simple way:

pop.rank <- as.numeric(factor(population))

回复收藏 0 原文

奢望 2024-09-16 01:00:23

我相信没有选择以等级来做到这一点；这是一个自定义函数，可以执行您想要的操作，但如果您的数据很大，它可能会太慢：

Rank<-function(d) {
    j<-unique(rev(sort(d)));
    return(sapply(d,function(dd) which(dd==j)));
}

I believe there is no option to do it with rank; here is a custom function that will do what you want, but it may be too slow if your data is huge:

Rank<-function(d) {
    j<-unique(rev(sort(d)));
    return(sapply(d,function(dd) which(dd==j)));
}

回复收藏 0 原文

不离久伴 2024-09-16 01:00:23

这回答了一个稍微不同的问题，即如何根据多列对 data.frame 对象进行排序。为此，您可以使用包 reshape 中的函数 sort_df：

> library(reshape)
> sort_df(df,vars=c('date','population'))
  idgeoville date population
3          4 1950        350
4          3 1950        350
2          8 1950        450
1          5 1950        500
8          8 2000        450
6          5 2000        500
7          8 2000        500
5          4 2000        650

This answers a slightly different question, namely how to sort a data.frame object based on multiple columns. To do this, you could use the function sort_df in package reshape:

> library(reshape)
> sort_df(df,vars=c('date','population'))
  idgeoville date population
3          4 1950        350
4          3 1950        350
2          8 1950        450
1          5 1950        500
8          8 2000        450
6          5 2000        500
7          8 2000        500
5          4 2000        650

回复收藏 0 原文

~没有更多了~