当 x 位于上区间边界值时 findInterval()

发布于 2025-01-11 04:03:22 字数 1839 浏览 1 评论 0 原文

我需要从 cut() 输出中获取间隔边界。我发现这个问题建议使用findInterval () 但如果 x 的值与 cut(x) 的上边框相同,则它不会按预期工作。请参阅此处:

x <- 1:3
breaks <- c(min(x), 2, max(x))
interval <- findInterval(x, breaks)

data.frame(x,
           groups= cut(x, breaks, include.lowest= TRUE),
           x_lower= breaks[interval],
           x_upper= breaks[interval + 1],
           interval)

  x groups x_lower x_upper interval
1 1  [1,2]       1       2        1
2 2  [1,2]       2       3        2
3 3  [2,3]       3       NA       3

我很高兴 cut() 如何从 x 生成 groups,但 x_lowerx_upper<第 2 行和第 3 行中的 /code> 与预期不符。第二行中 x 为 2,groups[1,2],因此我预计 x_lower 为 < code>1x_upper2。在第 3 行中,x 是 3,groups[2,3],所以我预计 x_lower2x_upper3。如果您处理数据,您会发现如果 x 值与上限相同,findinterval() 返回 groups 的下限值和上限值中的边框值。我想避免这种情况。我们怎样才能做到这一点?

预期输出

structure(list(x = 1:3, groups = structure(c(1L, 1L, 2L), .Label = c([1,2]", "(2,3]"), class = "factor"), x_lower = c(1, 1, 2), x_upper = c(2, 2, 3), interval = c(1, 1, 2)), class = "data.frame", row.names = c(NA, -3L))

备注 我确实想使用 findInterval() 并且不能按照 labels[as.numeric(groups)] //stackoverflow.com/questions/32356108/output-a-numeric-value-from-cut-in-r">上述问题。这是因为在我的情况下, x 有时是数字,有时是 Date/ POSIXct/ts/... 向量,因此,使用 as.numeric() 不会保存为我。

I need to get the interval boders from cut() output. I found this question that suggests to use findInterval() but it does not work as expected if value of x is same as the upper border of cut(x). See here:

x <- 1:3
breaks <- c(min(x), 2, max(x))
interval <- findInterval(x, breaks)

data.frame(x,
           groups= cut(x, breaks, include.lowest= TRUE),
           x_lower= breaks[interval],
           x_upper= breaks[interval + 1],
           interval)

  x groups x_lower x_upper interval
1 1  [1,2]       1       2        1
2 2  [1,2]       2       3        2
3 3  [2,3]       3       NA       3

I am happy how cut() makes groups from x but x_lower and x_upper in row 2 and 3 are not as expected. In row two x is 2, groups is [1,2], so I expect x_lower to be 1 and x_upper to be 2. And in row 3 x is 3, groups is [2,3], so I expect x_lower to be 2 and x_upper to be 3. If you play around with data you will see that findinterval() returns lower and upper values of groups if the x value is same as the upper border value in groups. I want to avoid that. How can we achieve this?

Expected output

structure(list(x = 1:3, groups = structure(c(1L, 1L, 2L), .Label = c([1,2]", "(2,3]"), class = "factor"), x_lower = c(1, 1, 2), x_upper = c(2, 2, 3), interval = c(1, 1, 2)), class = "data.frame", row.names = c(NA, -3L))

Remark
I do want to use findInterval() and I can not use labels[as.numeric(groups)] as suggested in another post of the question above. This is because in my situation x is sometime a numeric, sometime a Date/ POSIXct/ts/... vector, thus, using as.numeric() is not save for me.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。
列表为空,暂无数据
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文