重命名大 ID

发布于 2024-08-03 11:23:08 字数 529 浏览 6 评论 0原文

假设我有一个包含 N 行的 data.frame。 id 列有 10 个唯一值；所有这些值都是大于 1e7 的整数。我想将它们重命名为编号 1 到 10，并将这些新 ID 作为一列保存在我的 data.frame 中。

此外，我想轻松确定 1) 给定 id.new 的 id 和 2) 给定 id 的 id.new >。

例如：

> set.seed(123)
> ids <- sample(1:1e7,10)
> A <- data.frame(id=sample(ids,100,replace=TRUE),
                  x=rnorm(100))
> head(A)
       id          x
1 4566144  1.5164706
2 9404670 -1.5487528
3 5281052  0.5846137
4  455565  0.1238542
5 7883051  0.2159416
6 5514346  0.3796395

原文

Suppose I have a data.frame with N rows. The id column has 10 unique values; all those values are integers greater than 1e7. I would like to rename them to be numbered 1 through 10 and save these new IDs as a column in my data.frame.

Additionally, I would like to easily determine 1) id given id.new and 2) id.new given id.

For example:

> set.seed(123)
> ids <- sample(1:1e7,10)
> A <- data.frame(id=sample(ids,100,replace=TRUE),
                  x=rnorm(100))
> head(A)
       id          x
1 4566144  1.5164706
2 9404670 -1.5487528
3 5281052  0.5846137
4  455565  0.1238542
5 7883051  0.2159416
6 5514346  0.3796395

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

绝對不後悔。 2024-08-10 11:23:08

试试这个：

A$id.new <- match(A$id,unique(A$id))

附加评论：
要获取值表：

rbind(unique(A$id.new),unique(A$id))

Try this:

A$id.new <- match(A$id,unique(A$id))

Additional comment:
To get the table of values:

rbind(unique(A$id.new),unique(A$id))

回复收藏 0 原文

风轻花落早 2024-08-10 11:23:08

使用因素：

> A$id <- as.factor(A$id)
> A$id.new <- as.numeric(A$id)
> head(A)
       id          x id.new
1 4566144  1.5164706      4
2 9404670 -1.5487528     10
3 5281052  0.5846137      5
4  455565  0.1238542      1
5 7883051  0.2159416      7
6 5514346  0.3796395      6

假设 x 是旧 ID，而您需要新 ID。

> x <- 7883051
> as.numeric(which(levels(A$id)==x))
[1] 7

假设 y 是新 ID，而您想要旧 ID。

> as.numeric(as.character(A$id[which(as.integer(A$id)==y)[1]]))
[1] 5281052

（上面找到了 id 的第一个值，此时因子的内部代码为 5。有更好的方法吗？）

Using factors:

> A$id <- as.factor(A$id)
> A$id.new <- as.numeric(A$id)
> head(A)
       id          x id.new
1 4566144  1.5164706      4
2 9404670 -1.5487528     10
3 5281052  0.5846137      5
4  455565  0.1238542      1
5 7883051  0.2159416      7
6 5514346  0.3796395      6

Suppose x is the old ID and you want the new one.

> x <- 7883051
> as.numeric(which(levels(A$id)==x))
[1] 7

Suppose y is the new ID and you want the old one.

> as.numeric(as.character(A$id[which(as.integer(A$id)==y)[1]]))
[1] 5281052

(The above finds the first value of id at which the internal code for the factor is 5. Are there better ways?)

回复收藏 0 原文

流云如水 2024-08-10 11:23:08

您可以在此处使用factor()/ordered()：

R> set.seed(123)
R> ids <- sample(1:1e7,10)
R> A <- data.frame(id=sample(ids,100,replace=TRUE), x=rnorm(100))
R> A$id.new <- as.ordered(as.character(A$id))
R> table(A$id.new)

2875776 4089769  455565 4566144 5281052 5514346 7883051 8830172 8924185 9404670 
      6      10       6       8      12      10      13      10      10      15

然后您可以使用as.numeric()映射到1到10：

R> A$id.new <- as.numeric(A$id.new)
R> summary(A)
       id                x               id.new     
 Min.   : 455565   Min.   :-2.3092   Min.   : 1.00  
 1st Qu.:4566144   1st Qu.:-0.6933   1st Qu.: 4.00  
 Median :5514346   Median :-0.0634   Median : 6.00  
 Mean   :6370243   Mean   :-0.0594   Mean   : 6.07  
 3rd Qu.:8853675   3rd Qu.: 0.5575   3rd Qu.: 8.25  
 Max.   :9404670   Max.   : 2.1873   Max.   :10.00  
R>

You can use factor() / ordered() here:

R> set.seed(123)
R> ids <- sample(1:1e7,10)
R> A <- data.frame(id=sample(ids,100,replace=TRUE), x=rnorm(100))
R> A$id.new <- as.ordered(as.character(A$id))
R> table(A$id.new)

2875776 4089769  455565 4566144 5281052 5514346 7883051 8830172 8924185 9404670 
      6      10       6       8      12      10      13      10      10      15

And you can then use as.numeric() to map to 1 to 10:

R> A$id.new <- as.numeric(A$id.new)
R> summary(A)
       id                x               id.new     
 Min.   : 455565   Min.   :-2.3092   Min.   : 1.00  
 1st Qu.:4566144   1st Qu.:-0.6933   1st Qu.: 4.00  
 Median :5514346   Median :-0.0634   Median : 6.00  
 Mean   :6370243   Mean   :-0.0594   Mean   : 6.07  
 3rd Qu.:8853675   3rd Qu.: 0.5575   3rd Qu.: 8.25  
 Max.   :9404670   Max.   : 2.1873   Max.   :10.00  
R>

回复收藏 0 原文

好菇凉咱不稀罕他 2024-08-10 11:23:08

一种选择是使用 hash 包：

> library(hash)
> sn <- sort(unique(A$id))
> g <- hash(1:length(sn),sn)
> h <- hash(sn,1:length(sn))
> A$id.new <- .get(h,A$id)
> head(A)
       id          x id.new
1 4566144  1.5164706      4
2 9404670 -1.5487528     10
3 5281052  0.5846137      5
4  455565  0.1238542      1
5 7883051  0.2159416      7
6 5514346  0.3796395      6

假设 x 是旧 ID，而您需要新 ID。

> x <- 7883051
> .get(h,as.character(x))
7883051 
      7

假设 y 是新 ID，而您想要旧 ID。

> y <- 5
> .get(g,as.character(y))
      5 
5281052

（这有时比使用因素更方便/透明。）

One option is to use the hash package:

> library(hash)
> sn <- sort(unique(A$id))
> g <- hash(1:length(sn),sn)
> h <- hash(sn,1:length(sn))
> A$id.new <- .get(h,A$id)
> head(A)
       id          x id.new
1 4566144  1.5164706      4
2 9404670 -1.5487528     10
3 5281052  0.5846137      5
4  455565  0.1238542      1
5 7883051  0.2159416      7
6 5514346  0.3796395      6

Suppose x is the old ID and you want the new one.

> x <- 7883051
> .get(h,as.character(x))
7883051 
      7

Suppose y is the new ID and you want the old one.

> y <- 5
> .get(g,as.character(y))
      5 
5281052

(This can sometimes be more convenient/transparent than using factors.)

回复收藏 0 原文

~没有更多了~

关于作者

在梵高的星空下

暂无简介

0 文章

0 评论

22 人气

关注发私信

友情链接

文江博客

重命名大 ID

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（4）

关于作者

相关话题

热门标签

推荐作者

留蓝

18790681156

zach7772

Wini

ayeshaaroy

初雪

友情链接

重命名大 ID

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（4）

关于作者

相关话题

热门标签

推荐作者

留蓝

18790681156

zach7772

Wini

ayeshaaroy

初雪

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。