将剪切功能应用于数据框的所有列

发布于 2025-01-26 08:53:09 字数 758 浏览 1 评论 0原文

我有一个由10个连续变量组成的数据框架:

dat <- data.frame(replicate(10, sample(0:10,15,rep=TRUE)))

假设我想按宽度将一列键入其中一列,因此最低值的值将很低,值的中间1/3是中等的,等等。 ?

break_point <- sort(dat$X1)[round(1 * length(dat$X1)/3)]
break_point1 <- sort(dat$X1)[round(2 * length(dat$X1)/3)]

dat$X1 <- cut(dat$X1, breaks = c(-Inf, break_point, break_point1, Inf), labels = c("low", "medium", "high"))

我如何同时计算所有柱子

dat[1:length(dat)] <- lapply(dat[1:length(dat)], cut(breaks = c(-Inf, break_point, break_point1, Inf), labels = c("low", "medium", "high")))

这就是我所拥有的,但它不起作用。正如它所说的

cut.default中的错误(breaks = c(-inf,break_point,break_point1,inf), :参数“ x”缺少,没有默认值

I have a data frame that is composed of 10 continuous variables:

dat <- data.frame(replicate(10, sample(0:10,15,rep=TRUE)))

Let's say I want to bin one of the columns by width, so the lowest 1/3 of values would be low, the middle 1/3 of values would be medium, etc.

break_point <- sort(dat$X1)[round(1 * length(dat$X1)/3)]
break_point1 <- sort(dat$X1)[round(2 * length(dat$X1)/3)]

dat$X1 <- cut(dat$X1, breaks = c(-Inf, break_point, break_point1, Inf), labels = c("low", "medium", "high"))

How can I compute this bin for all the columns at the same time?

dat[1:length(dat)] <- lapply(dat[1:length(dat)], cut(breaks = c(-Inf, break_point, break_point1, Inf), labels = c("low", "medium", "high")))

This is what I have, but it's not working. As it says

Error in cut.default(breaks = c(-Inf, break_point, break_point1, Inf),
: argument "x" is missing, with no default

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

爱*していゐ 2025-02-02 08:53:09

我们可能需要一个lambda函数

dat[] <- lapply(dat, function(x) cut(x, 
    breaks = c(-Inf, break_point, break_point1, Inf), 
     labels = c("low", "medium", "high")))

,或者简单地指定参数的名称,而不是cut(

dat[] <- lapply(dat, cut, 
      breaks = c(-Inf, break_point, break_point1, Inf),
      labels = c("low", "medium", "high"))

We may need a lambda function

dat[] <- lapply(dat, function(x) cut(x, 
    breaks = c(-Inf, break_point, break_point1, Inf), 
     labels = c("low", "medium", "high")))

Or simply specify the parameters with its names, instead of cut(

dat[] <- lapply(dat, cut, 
      breaks = c(-Inf, break_point, break_point1, Inf),
      labels = c("low", "medium", "high"))
掩于岁月 2025-02-02 08:53:09

尝试santoku :: chop_equaly()

library(santoku)
dat[] <-apply(dat, 2, santoku::chop_equally, groups = 3, 
        labels = c("low", "medium", "high"))

      X1       X2       X3       X4       ......
 [1,] "low"    "high"   "low"    "medium" ......
 [2,] "high"   "high"   "low"    "low"    ......
 [3,] "high"   "high"   "high"   "low"    ......
 ......

请注意,这是根据列的分位数为每个列创建单独的断点。
如果您始终想要相同的断点,也只需要做

breaks <- quantile(as.matrix(dat), 0:3/3)
dat[] <- apply(dat, 2, cut, breaks = breaks)

,您说要按间隔宽度(每个间隔的宽度相等)切碎,但是您的示例是通过分位数切碎的(每个间隔中相等的单元格数)。如果需要间隔的宽度,请使用santoku :: chop_evenly()

Try santoku::chop_equally():

library(santoku)
dat[] <-apply(dat, 2, santoku::chop_equally, groups = 3, 
        labels = c("low", "medium", "high"))

      X1       X2       X3       X4       ......
 [1,] "low"    "high"   "low"    "medium" ......
 [2,] "high"   "high"   "low"    "low"    ......
 [3,] "high"   "high"   "high"   "low"    ......
 ......

Note that this creates separate breakpoints for each column, based on the quantiles of the column.
If you always want the same breakpoints, just do

breaks <- quantile(as.matrix(dat), 0:3/3)
dat[] <- apply(dat, 2, cut, breaks = breaks)

Also, you said you wanted to chop by width of intervals (equal width of each interval), but your example is chopping by quantiles (equal numbers of cells in each interval). If you want width of intervals, use santoku::chop_evenly().

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文