根据范围在 R 中创建分类变量
我有一个包含整数列的数据框,我想将其用作创建新分类变量的参考。我想将变量分为三组并自己设置范围(即0-5、6-10等)。我尝试了 cut
但它根据正态分布将变量分为几组,而我的数据是右偏的。我还尝试使用 if/then 语句,但这会输出 true/false 值,我想保留原始变量。我确信有一种简单的方法可以做到这一点,但我似乎无法弄清楚。关于快速完成此操作的简单方法有什么建议吗?
我心里有这样的想法:
x x.range
3 0-5
4 0-5
6 6-10
12 11-15
I have a dataframe with a column of integers that I would like to use as a reference to make a new categorical variable. I want to divide the variable into three groups and set the ranges myself (ie 0-5, 6-10, etc). I tried cut
but that divides the variable into groups based on a normal distribution and my data is right skewed. I have also tried to use if/then statements but this outputs a true/false value and I would like to keep my original variable. I am sure that there is a simple way to do this but I cannot seem to figure it out. Any advice on a simple way to do this quickly?
I had something in mind like this:
x x.range
3 0-5
4 0-5
6 6-10
12 11-15
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
据我所知,伊恩的回答(cut)是最常见的方法。
我更喜欢使用shingle,来自Lattice包的
指定分箱间隔的参数对我来说似乎更直观一些。
你可以像这样使用shingle:
Ian's answer (cut) is the most common way to do this, as far as i know.
I prefer to use shingle, from the Lattice Package
the argument that specifies the binning intervals seems a little more intuitive to me.
you use shingle like so:
我们可以使用
cutr
包中的smart_cut
:从 1 开始以长度为 5 的间隔进行切割:
要准确获得您请求的输出:
有关 cutr 和 smart_cut 的更多信息
We can use
smart_cut
from packagecutr
:To cut with intervals of length 5 starting on 1 :
To get exactly your requested output :
more on cutr and smart_cut