创建数字列的分箱值

发布于 2024-10-31 09:11:54 字数 500 浏览 1 评论 0原文

我有一个包含几列的数据框,其中一列是排名,是 1 到 20 之间的整数。我想创建另一列,其中包含像“1-4”、“5-10”、“11-”这样的 bin 值15”、“16-20”。

最有效的方法是什么?

我的数据框看起来像这样(.csv 格式):

rank,name,info
1,steve,red
3,joe,blue
6,john,green
3,liz,yellow
15,jon,pink

我想在数据框中添加另一列,所以它会像这样:

rank,name,info,binValue
1,steve,red,"1-4"
3,joe,blue,"1-4"
6,john,green, "5-10"
3,liz,yellow,"1-4"
15,jon,pink,"11-15"

我现在这样做的方式不起作用,因为我想保留data.frame 完好无损,如果 df$ranked 的值在给定范围内,则只需添加另一列。谢谢。

I have a dataframe with a few columns, one of those columns is ranks, an integer between 1 and 20. I want to create another column that contains a bin value like "1-4", "5-10", "11-15", "16-20".

What is the most effective way to do this?

the data frame that I have looks like this(.csv format):

rank,name,info
1,steve,red
3,joe,blue
6,john,green
3,liz,yellow
15,jon,pink

and I want to add another column to the dataframe, so it would be like this:

rank,name,info,binValue
1,steve,red,"1-4"
3,joe,blue,"1-4"
6,john,green, "5-10"
3,liz,yellow,"1-4"
15,jon,pink,"11-15"

The way I am doing it now is not working, as I would like to keep the data.frame intact, and just add another column if the value of df$ranked is within a given range. thank you.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

一场春暖 2024-11-07 09:11:54

请参阅?cut 并指定中断(也可能是标签)。

x$bins <- cut(x$rank, breaks=c(0,4,10,15), labels=c("1-4","5-10","10-15"))
x
#   rank  name   info  bins
# 1    1 steve    red   1-4
# 2    3   joe   blue   1-4
# 3    6  john  green  5-10
# 4    3   liz yellow   1-4
# 5   15   jon   pink 10-15

See ?cut and specify breaks (and maybe labels).

x$bins <- cut(x$rank, breaks=c(0,4,10,15), labels=c("1-4","5-10","10-15"))
x
#   rank  name   info  bins
# 1    1 steve    red   1-4
# 2    3   joe   blue   1-4
# 3    6  john  green  5-10
# 4    3   liz yellow   1-4
# 5   15   jon   pink 10-15
寒江雪… 2024-11-07 09:11:54
dat <- "rank,name,info
1,steve,red
3,joe,blue
6,john,green
3,liz,yellow
15,jon,pink"

x <- read.table(textConnection(dat), header=TRUE, sep=",", stringsAsFactors=FALSE)
x$bins <- cut(x$rank, breaks=seq(0, 20, 5), labels=c("1-5", "6-10", "11-15", "16-20"))
x

  rank  name   info  bins
1    1 steve    red   1-5
2    3   joe   blue   1-5
3    6  john  green  6-10
4    3   liz yellow   1-5
5   15   jon   pink 11-15
dat <- "rank,name,info
1,steve,red
3,joe,blue
6,john,green
3,liz,yellow
15,jon,pink"

x <- read.table(textConnection(dat), header=TRUE, sep=",", stringsAsFactors=FALSE)
x$bins <- cut(x$rank, breaks=seq(0, 20, 5), labels=c("1-5", "6-10", "11-15", "16-20"))
x

  rank  name   info  bins
1    1 steve    red   1-5
2    3   joe   blue   1-5
3    6  john  green  6-10
4    3   liz yellow   1-5
5   15   jon   pink 11-15
﹏雨一样淡蓝的深情 2024-11-07 09:11:54

我们可以使用 cutr 包中的 smart_cut

# devtools::install_github("moodymudskipper/cutr")
library(cutr)

使用 @Andrie 的示例数据:

x$bins <- smart_cut(x$rank,
                    c(1,5,11,16), 
                    labels = ~paste0(.y[1],'-',.y[2]-1), 
                    simplify = FALSE)
# rank  name   info  bins
# 1    1 steve    red   1-4
# 2    3   joe   blue   1-4
# 3    6  john  green  5-10
# 4    3   liz yellow   1-4
# 5   15   jon   pink 11-15

有关 cutr 和 smart_cut 的更多信息

We can use smart_cut from package cutr :

# devtools::install_github("moodymudskipper/cutr")
library(cutr)

Using @Andrie's sample data:

x$bins <- smart_cut(x$rank,
                    c(1,5,11,16), 
                    labels = ~paste0(.y[1],'-',.y[2]-1), 
                    simplify = FALSE)
# rank  name   info  bins
# 1    1 steve    red   1-4
# 2    3   joe   blue   1-4
# 3    6  john  green  5-10
# 4    3   liz yellow   1-4
# 5   15   jon   pink 11-15

more on cutr and smart_cut

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文