将剪切功能应用于数据框的所有列

发布于 2025-01-26 08:53:09 字数 758 浏览 1 评论 0原文

我有一个由10个连续变量组成的数据框架：

dat <- data.frame(replicate(10, sample(0:10,15,rep=TRUE)))

假设我想按宽度将一列键入其中一列，因此最低值的值将很低，值的中间1/3是中等的，等等。？

break_point <- sort(dat$X1)[round(1 * length(dat$X1)/3)]
break_point1 <- sort(dat$X1)[round(2 * length(dat$X1)/3)]

dat$X1 <- cut(dat$X1, breaks = c(-Inf, break_point, break_point1, Inf), labels = c("low", "medium", "high"))

我如何同时计算所有柱子

dat[1:length(dat)] <- lapply(dat[1:length(dat)], cut(breaks = c(-Inf, break_point, break_point1, Inf), labels = c("low", "medium", "high")))

这就是我所拥有的，但它不起作用。正如它所说的

cut.default中的错误（breaks = c（-inf，break_point，break_point1，inf），：参数“ x”缺少，没有默认值

原文

I have a data frame that is composed of 10 continuous variables:

dat <- data.frame(replicate(10, sample(0:10,15,rep=TRUE)))

Let's say I want to bin one of the columns by width, so the lowest 1/3 of values would be low, the middle 1/3 of values would be medium, etc.

break_point <- sort(dat$X1)[round(1 * length(dat$X1)/3)]
break_point1 <- sort(dat$X1)[round(2 * length(dat$X1)/3)]

dat$X1 <- cut(dat$X1, breaks = c(-Inf, break_point, break_point1, Inf), labels = c("low", "medium", "high"))

How can I compute this bin for all the columns at the same time?

dat[1:length(dat)] <- lapply(dat[1:length(dat)], cut(breaks = c(-Inf, break_point, break_point1, Inf), labels = c("low", "medium", "high")))

This is what I have, but it's not working. As it says

Error in cut.default(breaks = c(-Inf, break_point, break_point1, Inf),
: argument "x" is missing, with no default

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

爱*していゐ 2025-02-02 08:53:09

我们可能需要一个lambda函数

dat[] <- lapply(dat, function(x) cut(x, 
    breaks = c(-Inf, break_point, break_point1, Inf), 
     labels = c("low", "medium", "high")))

，或者简单地指定参数的名称，而不是cut（）

dat[] <- lapply(dat, cut, 
      breaks = c(-Inf, break_point, break_point1, Inf),
      labels = c("low", "medium", "high"))

We may need a lambda function

dat[] <- lapply(dat, function(x) cut(x, 
    breaks = c(-Inf, break_point, break_point1, Inf), 
     labels = c("low", "medium", "high")))

Or simply specify the parameters with its names, instead of cut(

dat[] <- lapply(dat, cut, 
      breaks = c(-Inf, break_point, break_point1, Inf),
      labels = c("low", "medium", "high"))

回复收藏 0 原文

掩于岁月 2025-02-02 08:53:09

尝试santoku :: chop_equaly（）：

library(santoku)
dat[] <-apply(dat, 2, santoku::chop_equally, groups = 3, 
        labels = c("low", "medium", "high"))

      X1       X2       X3       X4       ......
 [1,] "low"    "high"   "low"    "medium" ......
 [2,] "high"   "high"   "low"    "low"    ......
 [3,] "high"   "high"   "high"   "low"    ......
 ......

请注意，这是根据列的分位数为每个列创建单独的断点。
如果您始终想要相同的断点，也只需要做

breaks <- quantile(as.matrix(dat), 0:3/3)
dat[] <- apply(dat, 2, cut, breaks = breaks)

，您说要按间隔宽度（每个间隔的宽度相等）切碎，但是您的示例是通过分位数切碎的（每个间隔中相等的单元格数）。如果需要间隔的宽度，请使用santoku :: chop_evenly（）。

Try santoku::chop_equally():

library(santoku)
dat[] <-apply(dat, 2, santoku::chop_equally, groups = 3, 
        labels = c("low", "medium", "high"))

      X1       X2       X3       X4       ......
 [1,] "low"    "high"   "low"    "medium" ......
 [2,] "high"   "high"   "low"    "low"    ......
 [3,] "high"   "high"   "high"   "low"    ......
 ......

Note that this creates separate breakpoints for each column, based on the quantiles of the column.
If you always want the same breakpoints, just do

breaks <- quantile(as.matrix(dat), 0:3/3)
dat[] <- apply(dat, 2, cut, breaks = breaks)

Also, you said you wanted to chop by width of intervals (equal width of each interval), but your example is chopping by quantiles (equal numbers of cells in each interval). If you want width of intervals, use santoku::chop_evenly().

回复收藏 0 原文

~没有更多了~