使用 R 将变量值编码到类中

发布于 2024-11-08 10:55:32 字数 939 浏览 2 评论 0原文

我有一组数据,需要将某些变量(数字)的值编码为 3 个类。

我的数据集与此类似,但多了 60 个变量:

anim <- c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15)
wt <- c(181,179,180.5,201,201.5,245,246.4,189.3,301,354,369,205,199,394,231.3)
data <- data.frame(anim,wt)

> data
   anim    wt
1     1 181.0
2     2 179.0
3     3 180.5
4     4 201.0
5     5 201.5
6     6 245.0
7     7 246.4
8     8 189.3
9     9 301.0
10   10 354.0
11   11 369.0
12   12 205.0
13   13 199.0
14   14 394.0
15   15 231.3

我需要将变量“wt”的值编码为 3 个类:(wt >= 179 & wt < 200) = 1; (重量≥200&重量<300)=2; (wt > 300) = 3

这应该给我这个

> data2
   anim    wt SWT
1     1 181.0   1
2     2 179.0   1
3     3 180.5   1
4     4 201.0   2
5     5 201.5   2
6     6 245.0   2
7     7 246.4   2
8     8 189.3   1
9     9 301.0   3
10   10 354.0   3
11   11 369.0   3
12   12 205.0   2
13   13 199.0   1
14   14 394.0   3
15   15 231.3   2

I have a set of data in which I need to code values of certain variables (numeric) into 3 classes.

My data set is similar to this but has 60 more variables:

anim <- c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15)
wt <- c(181,179,180.5,201,201.5,245,246.4,189.3,301,354,369,205,199,394,231.3)
data <- data.frame(anim,wt)

> data
   anim    wt
1     1 181.0
2     2 179.0
3     3 180.5
4     4 201.0
5     5 201.5
6     6 245.0
7     7 246.4
8     8 189.3
9     9 301.0
10   10 354.0
11   11 369.0
12   12 205.0
13   13 199.0
14   14 394.0
15   15 231.3

I need to code values of the variable "wt" up into 3 classes: (wt >= 179 & wt < 200) = 1; (wt >= 200 & wt < 300) = 2; (wt > 300) = 3

which should give me this

> data2
   anim    wt SWT
1     1 181.0   1
2     2 179.0   1
3     3 180.5   1
4     4 201.0   2
5     5 201.5   2
6     6 245.0   2
7     7 246.4   2
8     8 189.3   1
9     9 301.0   3
10   10 354.0   3
11   11 369.0   3
12   12 205.0   2
13   13 199.0   1
14   14 394.0   3
15   15 231.3   2

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

云淡月浅 2024-11-15 10:55:32

@Greg 概述的 cut 方法可能就是您想要的。需要注意的一件事是,cut 默认返回一个因子,您可以通过提供 labels = FALSE 来返回整数值来抑制该因子:

cut(data$wt, c(178, 200, 300, Inf), labels = FALSE)

或者,如果您的切割不借出本身到自然中断,您可以使用ifelse()。您可以像 Excel 一样“嵌套”ifelse 语句。我使用“with”来减少所需的打字:

data$group2 <- with(data, ifelse(wt >= 179 & wt < 200, 1, 
  ifelse(wt >= 200 & wt < 300, 2, 3))
)

The cut method as outlined by @Greg is probably what you want here. One thing to note is that cut returns a factor by default, which you can suppress by supplying labels = FALSE to return the integer values:

cut(data$wt, c(178, 200, 300, Inf), labels = FALSE)

Alternatively, if your cutting does not lend itself to natural breaks, you can use ifelse(). You can "nest" the ifelse statements similar to Excel. I use "with" to cut down on the typing needed:

data$group2 <- with(data, ifelse(wt >= 179 & wt < 200, 1, 
  ifelse(wt >= 200 & wt < 300, 2, 3))
)
韵柒 2024-11-15 10:55:32

您可以尝试 cut

anim <- c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15) 
wt <-c(181,179,180.5,201,201.5,245,246.4,
189.3,301,354,369,205,199,394,231.3) 
data <- data.frame(anim,wt)

编辑:固定组 - right = FALSE,摆脱了拆分示例。

group = cut(data$wt, c(178, 200, 300, Inf), right=FALSE)


data$swt = as.numeric(group)
data
   anim    wt swt
1     1 181.0   1
2     2 179.0   1
3     3 180.5   1
4     4 201.0   2
5     5 201.5   2
6     6 245.0   2
7     7 246.4   2
8     8 189.3   1
9     9 301.0   3
10   10 354.0   3
11   11 369.0   3
12   12 205.0   2
13   13 199.0   1
14   14 394.0   3
15   15 231.3   2
> 

You can try cut

anim <- c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15) 
wt <-c(181,179,180.5,201,201.5,245,246.4,
189.3,301,354,369,205,199,394,231.3) 
data <- data.frame(anim,wt)

EDIT: fixed group - right = FALSE, got rid of split example.

group = cut(data$wt, c(178, 200, 300, Inf), right=FALSE)


data$swt = as.numeric(group)
data
   anim    wt swt
1     1 181.0   1
2     2 179.0   1
3     3 180.5   1
4     4 201.0   2
5     5 201.5   2
6     6 245.0   2
7     7 246.4   2
8     8 189.3   1
9     9 301.0   3
10   10 354.0   3
11   11 369.0   3
12   12 205.0   2
13   13 199.0   1
14   14 394.0   3
15   15 231.3   2
> 
暮年慕年 2024-11-15 10:55:32

我认为 Greg 的答案涵盖了“标准操作程序”,但我发现 findInterval 函数也有很多用途。它自然会返回一个数字,用于标识第二个参数中的间隔。

 data$int <- findInterval(data$wt, c(179, 200, 300, Inf))
 data

I think Greg's answers cover "standard operating procedure", but I find many uses for the findInterval function as well. It naturally returns a number that identifies the interval in the second argument.

 data$int <- findInterval(data$wt, c(179, 200, 300, Inf))
 data
与往事干杯 2024-11-15 10:55:32

只是为了显示包 car 中的替代方法(类似于在 SPSS 中重新编码):

> data$SWT <- with(data, recode(wt, "lo:200=1; 300:hi=3; else=2"))
> data
   anim    wt SWT
1     1 181.0   1
2     2 179.0   1
3     3 180.5   1
4     4 201.0   2
5     5 201.5   2
6     6 245.0   2
7     7 246.4   2
8     8 189.3   1
9     9 301.0   3
10   10 354.0   3
11   11 369.0   3
12   12 205.0   2
13   13 199.0   1
14   14 394.0   3
15   15 231.3   2

Just to show an alternate (similar to recode in SPSS) method from package car:

> data$SWT <- with(data, recode(wt, "lo:200=1; 300:hi=3; else=2"))
> data
   anim    wt SWT
1     1 181.0   1
2     2 179.0   1
3     3 180.5   1
4     4 201.0   2
5     5 201.5   2
6     6 245.0   2
7     7 246.4   2
8     8 189.3   1
9     9 301.0   3
10   10 354.0   3
11   11 369.0   3
12   12 205.0   2
13   13 199.0   1
14   14 394.0   3
15   15 231.3   2
风和你 2024-11-15 10:55:32

出于完整性和信息的目的,classInt 包(在 CRAN 上)是另一种将数字分类的便捷方法。

Just for completeness and info, the classInt package (on CRAN) is another handy way to classify numbers into classes.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文